Re: [HACKERS] logical decoding of two-phase transactions
Hi,
PFA, latest patchset which incorporates the additional feedback.
There's an additional test case in
0005-Additional-test-case-to-demonstrate-decoding-rollbac.patch which
uses a sleep in the "change" plugin API to allow a concurrent rollback
on the 2PC being currently decoded. Andres generally doesn't like this
approach :-), but there are no timing/interlocking issues now, and the
sleep just helps us do a concurrent rollback, so it might be ok now,
all things considered. Anyways, it's an additional patch for now.Yea, I still don't think it's ok. The tests won't be reliable. There's
ways to make this reliable, e.g. by forcing a lock to be acquired that's
externally held or such. Might even be doable just with a weird custom
datatype.Ok, I will look at ways to do away with the sleep.
The attached patchset implements a non-sleep based approached by
sending the 2PC XID to the pg_logical_slot_get_changes() function as
an option for the test_decoding plugin. So, an example invocation
will now look like:
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'skip-empty-xacts', '1', 'check-xid', '$xid2pc');
The test_decoding pg_decode_change() API if it sees a valid xid
argument will wait for it to be aborted. Another backend can then come
in and merrily abort this ongoing 2PC in the background. Once it's
aborted, the pg_decode_change API will go ahead and will hit an ERROR
in the systable scan APIs. That should take care of Andres' concern
about using sleep in the tests. The relevant tap test has been added
to this patchset.
@@ -423,6 +423,16 @@ systable_getnext(SysScanDesc sysscan)
else
htup = heap_getnext(sysscan->scan, ForwardScanDirection);+ /* + * If CheckXidAlive is valid, then we check if it aborted. If it did, we + * error out + */ + if (TransactionIdIsValid(CheckXidAlive) && + TransactionIdDidAbort(CheckXidAlive)) + ereport(ERROR, + (errcode(ERRCODE_TRANSACTION_ROLLBACK), + errmsg("transaction aborted during system catalog scan"))); + return htup; }Don't we have to check TransactionIdIsInProgress() first? C.f. header
comments in tqual.c. Note this is also not guaranteed to be correct
after a crash (where no clog entry will exist for an aborted xact), but
we probably shouldn't get here in that case - but better be safe.I suspect it'd be better reformulated as
TransactionIdIsValid(CheckXidAlive) &&
!TransactionIdIsInProgress(CheckXidAlive) &&
!TransactionIdDidCommit(CheckXidAlive)What do you think?
Modified the checks are per the above suggestion.
I was wondering if anything else would be needed for user-defined
catalog tables..
We don't need to do anything else for user-defined catalog tables
since they will also get accessed via the systable_* scan APIs.
Hmm, lemme see if we can do it outside of the plugin. But note that a
plugin might want to decode some 2PC at prepare time and another are
"commit prepared" time.
The test_decoding pg_decode_filter_prepare() API implements a simple
filter strategy now. If the GID contains a substring "nodecode", then
it filters out decoding of such a 2PC at prepare time. Have added
steps to test this in the relevant test case in this patch.
I believe this patchset handles all pending issues along with relevant
test cases. Comments, further feedback appreciated.
Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patchapplication/octet-stream; name=0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patchDownload
From 8e32c723490df640ffd071716930715ba087814c Mon Sep 17 00:00:00 2001
From: Nikhil Sontakke <nikhils@2ndQuadrant.com>
Date: Wed, 13 Jun 2018 16:15:24 +0530
Subject: [PATCH 1/4] Cleaning up of flags in ReorderBufferTXN structure
---
src/backend/replication/logical/reorderbuffer.c | 34 ++++++++++++-------------
src/include/replication/reorderbuffer.h | 33 ++++++++++++++----------
2 files changed, 37 insertions(+), 30 deletions(-)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 9b55b94227..fb71631434 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -643,7 +643,7 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
Assert(prev_first_lsn < cur_txn->first_lsn);
/* known-as-subtxn txns must not be listed */
- Assert(!cur_txn->is_known_as_subxact);
+ Assert(!rbtxn_is_known_subxact(cur_txn));
prev_first_lsn = cur_txn->first_lsn;
}
@@ -663,7 +663,7 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
Assert(prev_base_snap_lsn < cur_txn->base_snapshot_lsn);
/* known-as-subtxn txns must not be listed */
- Assert(!cur_txn->is_known_as_subxact);
+ Assert(!rbtxn_is_known_subxact(cur_txn));
prev_base_snap_lsn = cur_txn->base_snapshot_lsn;
}
@@ -686,7 +686,7 @@ ReorderBufferGetOldestTXN(ReorderBuffer *rb)
txn = dlist_head_element(ReorderBufferTXN, node, &rb->toplevel_by_lsn);
- Assert(!txn->is_known_as_subxact);
+ Assert(!rbtxn_is_known_subxact(txn));
Assert(txn->first_lsn != InvalidXLogRecPtr);
return txn;
}
@@ -746,7 +746,7 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
if (!new_sub)
{
- if (subtxn->is_known_as_subxact)
+ if (rbtxn_is_known_subxact(subtxn))
{
/* already associated, nothing to do */
return;
@@ -762,7 +762,7 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
}
}
- subtxn->is_known_as_subxact = true;
+ subtxn->txn_flags |= RBTXN_IS_SUBXACT;
subtxn->toplevel_xid = xid;
Assert(subtxn->nsubtxns == 0);
@@ -972,7 +972,7 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn)
{
ReorderBufferChange *cur_change;
- if (txn->serialized)
+ if (rbtxn_is_serialized(txn))
{
/* serialize remaining changes */
ReorderBufferSerializeTXN(rb, txn);
@@ -1001,7 +1001,7 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn)
{
ReorderBufferChange *cur_change;
- if (cur_txn->serialized)
+ if (rbtxn_is_serialized(cur_txn))
{
/* serialize remaining changes */
ReorderBufferSerializeTXN(rb, cur_txn);
@@ -1167,7 +1167,7 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* they originally were happening inside another subtxn, so we won't
* ever recurse more than one level deep here.
*/
- Assert(subtxn->is_known_as_subxact);
+ Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
ReorderBufferCleanupTXN(rb, subtxn);
@@ -1208,7 +1208,7 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
/*
* Remove TXN from its containing list.
*
- * Note: if txn->is_known_as_subxact, we are deleting the TXN from its
+ * Note: if txn is known as subxact, we are deleting the TXN from its
* parent's list of known subxacts; this leaves the parent's nsubxacts
* count too high, but we don't care. Otherwise, we are deleting the TXN
* from the LSN-ordered list of toplevel TXNs.
@@ -1223,7 +1223,7 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(found);
/* remove entries spilled to disk */
- if (txn->serialized)
+ if (rbtxn_is_serialized(txn))
ReorderBufferRestoreCleanup(rb, txn);
/* deallocate */
@@ -1240,7 +1240,7 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
dlist_iter iter;
HASHCTL hash_ctl;
- if (!txn->has_catalog_changes || dlist_is_empty(&txn->tuplecids))
+ if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids))
return;
memset(&hash_ctl, 0, sizeof(hash_ctl));
@@ -1854,7 +1854,7 @@ ReorderBufferAbortOld(ReorderBuffer *rb, TransactionId oldestRunningXid)
* final_lsn to that of their last change; this causes
* ReorderBufferRestoreCleanup to do the right thing.
*/
- if (txn->serialized && txn->final_lsn == 0)
+ if (rbtxn_is_serialized(txn) && txn->final_lsn == 0)
{
ReorderBufferChange *last =
dlist_tail_element(ReorderBufferChange, node, &txn->changes);
@@ -2002,7 +2002,7 @@ ReorderBufferSetBaseSnapshot(ReorderBuffer *rb, TransactionId xid,
* operate on its top-level transaction instead.
*/
txn = ReorderBufferTXNByXid(rb, xid, true, &is_new, lsn, true);
- if (txn->is_known_as_subxact)
+ if (rbtxn_is_known_subxact(txn))
txn = ReorderBufferTXNByXid(rb, txn->toplevel_xid, false,
NULL, InvalidXLogRecPtr, false);
Assert(txn->base_snapshot == NULL);
@@ -2109,7 +2109,7 @@ ReorderBufferXidSetCatalogChanges(ReorderBuffer *rb, TransactionId xid,
txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
- txn->has_catalog_changes = true;
+ txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
}
/*
@@ -2126,7 +2126,7 @@ ReorderBufferXidHasCatalogChanges(ReorderBuffer *rb, TransactionId xid)
if (txn == NULL)
return false;
- return txn->has_catalog_changes;
+ return rbtxn_has_catalog_changes(txn);
}
/*
@@ -2146,7 +2146,7 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
return false;
/* a known subtxn? operate on top-level txn instead */
- if (txn->is_known_as_subxact)
+ if (rbtxn_is_known_subxact(txn))
txn = ReorderBufferTXNByXid(rb, txn->toplevel_xid, false,
NULL, InvalidXLogRecPtr, false);
@@ -2267,7 +2267,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(spilled == txn->nentries_mem);
Assert(dlist_is_empty(&txn->changes));
txn->nentries_mem = 0;
- txn->serialized = true;
+ txn->txn_flags |= RBTXN_IS_SERIALIZED;
if (fd != -1)
CloseTransientFile(fd);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1f52f6bde7..ec9515d156 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -150,18 +150,34 @@ typedef struct ReorderBufferChange
dlist_node node;
} ReorderBufferChange;
+/* ReorderBufferTXN flags */
+#define RBTXN_HAS_CATALOG_CHANGES 0x0001
+#define RBTXN_IS_SUBXACT 0x0002
+#define RBTXN_IS_SERIALIZED 0x0004
+
+/* does the txn have catalog changes */
+#define rbtxn_has_catalog_changes(txn) (txn->txn_flags & RBTXN_HAS_CATALOG_CHANGES)
+/* is the txn known as a subxact? */
+#define rbtxn_is_known_subxact(txn) (txn->txn_flags & RBTXN_IS_SUBXACT)
+/*
+ * Has this transaction been spilled to disk? It's not always possible to
+ * deduce that fact by comparing nentries with nentries_mem, because e.g.
+ * subtransactions of a large transaction might get serialized together
+ * with the parent - if they're restored to memory they'd have
+ * nentries_mem == nentries.
+ */
+#define rbtxn_is_serialized(txn) (txn->txn_flags & RBTXN_IS_SERIALIZED)
+
typedef struct ReorderBufferTXN
{
+ int txn_flags;
+
/*
* The transactions transaction id, can be a toplevel or sub xid.
*/
TransactionId xid;
- /* did the TX have catalog changes */
- bool has_catalog_changes;
-
/* Do we know this is a subxact? Xid of top-level txn if so */
- bool is_known_as_subxact;
TransactionId toplevel_xid;
/*
@@ -229,15 +245,6 @@ typedef struct ReorderBufferTXN
*/
uint64 nentries_mem;
- /*
- * Has this transaction been spilled to disk? It's not always possible to
- * deduce that fact by comparing nentries with nentries_mem, because e.g.
- * subtransactions of a large transaction might get serialized together
- * with the parent - if they're restored to memory they'd have
- * nentries_mem == nentries.
- */
- bool serialized;
-
/*
* List of ReorderBufferChange structs, including new Snapshots and new
* CommandIds
--
2.15.2 (Apple Git-101.1)
0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patchapplication/octet-stream; name=0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patchDownload
From 83b83aed7769a4971c83823f50a3799cf74bd8d5 Mon Sep 17 00:00:00 2001
From: Nikhil Sontakke <nikhils@2ndQuadrant.com>
Date: Wed, 13 Jun 2018 16:30:30 +0530
Subject: [PATCH 2/4] Support decoding of two-phase transactions at PREPARE
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
On the subscriber, the transactions will be executed as two-phase
transactions, with the same GID. This is important for various
external transaction managers, that often encode information into
the GID itself.
Includes documentation changes.
---
doc/src/sgml/logicaldecoding.sgml | 127 ++++++++++++++-
src/backend/replication/logical/decode.c | 147 +++++++++++++++--
src/backend/replication/logical/logical.c | 203 ++++++++++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 185 +++++++++++++++++++--
src/include/replication/logical.h | 2 +-
src/include/replication/output_plugin.h | 46 ++++++
src/include/replication/reorderbuffer.h | 68 ++++++++
7 files changed, 746 insertions(+), 32 deletions(-)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 8db968641e..a89e4d5184 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -385,7 +385,12 @@ typedef struct OutputPluginCallbacks
LogicalDecodeChangeCB change_cb;
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeAbortCB abort_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
} OutputPluginCallbacks;
@@ -457,7 +462,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -558,6 +569,71 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Commit Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+ <title>Rollback Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>abort_prepared_cb</function> callback is called whenever
+ a rollback prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort">
+ <title>Transaction Abort Callback</title>
+
+ <para>
+ The required <function>abort_cb</function> callback is called whenever
+ a transaction abort has to be initiated. This can happen if we are
+ decoding a transaction that has been prepared for two-phase commit and
+ a concurrent rollback happens while we are decoding it.
+<programlisting>
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -567,7 +643,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -644,6 +726,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -665,7 +780,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 59c003de9c..99d801eb94 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -34,6 +34,7 @@
#include "access/xlogutils.h"
#include "access/xlogreader.h"
#include "access/xlogrecord.h"
+#include "access/twophase.h"
#include "catalog/pg_control.h"
@@ -73,6 +74,8 @@ static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -281,16 +284,33 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
break;
}
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->options.enable_twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
+
+ /* ok, parse it */
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ XLogRecGetData(buf->record), &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
break;
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
@@ -633,9 +653,90 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
buf->origptr, buf->endptr);
}
+ /*
+ * Decide if we're processing COMMIT PREPARED, or a regular COMMIT.
+ * Regular commit simply triggers a replay of transaction changes from the
+ * reorder buffer. For COMMIT PREPARED that however already happened at
+ * PREPARE time, and so we only need to notify the subscriber that the GID
+ * finally committed.
+ *
+ * For output plugins that do not support PREPARE-time decoding of
+ * two-phase transactions, we never even see the PREPARE and all two-phase
+ * transactions simply fall through to the second branch.
+ */
+ if (TransactionIdIsValid(parsed->twophase_xid) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder,
+ parsed->twophase_xid, parsed->twophase_gid))
+ {
+ Assert(xid == parsed->twophase_xid);
+ /* we are processing COMMIT PREPARED */
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+ else
+ {
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ /*
+ * Process invalidation messages, even if we're not interested in the
+ * transaction's contents, since the various caches need to always be
+ * consistent.
+ */
+ if (parsed->nmsgs > 0)
+ {
+ if (!ctx->fast_forward)
+ ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
+ parsed->nmsgs, parsed->msgs);
+ ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+ }
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ {
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
+ }
+ ReorderBufferForget(ctx->reorder, xid, buf->origptr);
+
+ return;
+ }
+
/* replay actions of all transaction + subtransactions in order */
- ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
- commit_time, origin_id, origin_lsn);
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
}
/*
@@ -647,6 +748,30 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid)
{
int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
for (i = 0; i < parsed->nsubxacts; i++)
{
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 3cd4eefb9b..f3b28c3dab 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -60,6 +60,16 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static void abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -187,6 +197,11 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->abort = abort_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->abort_prepared = abort_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
ctx->out = makeStringInfo();
@@ -611,6 +626,33 @@ startup_cb_wrapper(LogicalDecodingContext *ctx, OutputPluginOptions *opt, bool i
/* do the actual work: call callback */
ctx->callbacks.startup_cb(ctx, opt, is_init);
+ /*
+ * If the plugin claims to support two-phase transactions, then
+ * check that the plugin implements all callbacks necessary to decode
+ * two-phase transactions - we either have to have all of them or none.
+ * The filter_prepare callback is optional, but can only be defined when
+ * two-phase decoding is enabled (i.e. the three other callbacks are
+ * defined).
+ */
+ if (opt->enable_twophase)
+ {
+ int twophase_callbacks = (ctx->callbacks.prepare_cb != NULL) +
+ (ctx->callbacks.commit_prepared_cb != NULL) +
+ (ctx->callbacks.abort_prepared_cb != NULL);
+
+ /* Plugins with incorrect number of two-phase callbacks are broken. */
+ if ((twophase_callbacks != 3) && (twophase_callbacks != 0))
+ ereport(ERROR,
+ (errmsg("Output plugin registered only %d twophase callbacks. ",
+ twophase_callbacks)));
+ }
+
+ /* filter_prepare is optional, but requires two-phase decoding */
+ if ((ctx->callbacks.filter_prepare_cb != NULL) && (!opt->enable_twophase))
+ ereport(ERROR,
+ (errmsg("Output plugin does not support two-phase decoding, but "
+ "registered filter_prepared callback.")));
+
/* Pop the error context stack */
error_context_stack = errcallback.previous;
}
@@ -708,6 +750,122 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static void
+abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort";
+ state.report_location = txn->final_lsn; /* beginning of abort record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
@@ -785,6 +943,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of twophase at PREPARE time is not enabled. In that
+ * case all twophase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->options.enable_twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index fb71631434..2fffc90606 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -337,6 +337,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
if (txn->tuplecid_hash != NULL)
{
@@ -1389,25 +1394,18 @@ ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap)
* and subtransactions (using a k-way merge) and replay the changes in lsn
* order.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
volatile Snapshot snapshot_now;
volatile CommandId command_id = FirstCommandId;
bool using_subtxn;
ReorderBufferIterTXNState *volatile iterstate = NULL;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -1711,7 +1709,6 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
break;
}
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
@@ -1727,8 +1724,22 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
ReorderBufferIterTXNFinish(rb, iterstate);
iterstate = NULL;
- /* call commit callback */
- rb->commit(rb, txn, commit_lsn);
+ /*
+ * Call abort/commit/prepare callback, depending on the transaction
+ * state.
+ *
+ * If the transaction aborted during apply (which currently can happen
+ * only for prepared transactions), simply call the abort callback.
+ *
+ * Otherwise call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
+ else if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -1755,7 +1766,12 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
if (snapshot_now->copied)
ReorderBufferFreeSnap(rb, snapshot_now);
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
ReorderBufferCleanupTXN(rb, txn);
}
PG_CATCH();
@@ -1789,6 +1805,141 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
PG_END_TRY();
}
+
+/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a twophase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ABORT PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->abort_prepared(rb, txn, commit_lsn);
+ }
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index c25ac1fa85..5fdda65031 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -47,7 +47,7 @@ typedef struct LogicalDecodingContext
/*
* Marks the logical decoding context as fast forward decoding one. Such a
- * context does not have plugin loaded so most of the the following
+ * context does not have plugin loaded so most of the following
* properties are unused.
*/
bool fast_forward;
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 1ee0a56f03..c9140e7001 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -27,6 +27,7 @@ typedef struct OutputPluginOptions
{
OutputPluginOutputType output_type;
bool receive_rewrites;
+ bool enable_twophase;
} OutputPluginOptions;
/*
@@ -77,6 +78,46 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/*
+ * Called for an implicit ABORT of a transaction.
+ */
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/abort_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -109,7 +150,12 @@ typedef struct OutputPluginCallbacks
LogicalDecodeChangeCB change_cb;
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeAbortCB abort_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
} OutputPluginCallbacks;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index ec9515d156..285c9b53da 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -154,6 +155,11 @@ typedef struct ReorderBufferChange
#define RBTXN_HAS_CATALOG_CHANGES 0x0001
#define RBTXN_IS_SUBXACT 0x0002
#define RBTXN_IS_SERIALIZED 0x0004
+#define RBTXN_PREPARE 0x0008
+#define RBTXN_COMMIT_PREPARED 0x0010
+#define RBTXN_ROLLBACK_PREPARED 0x0020
+#define RBTXN_COMMIT 0x0040
+#define RBTXN_ROLLBACK 0x0080
/* does the txn have catalog changes */
#define rbtxn_has_catalog_changes(txn) (txn->txn_flags & RBTXN_HAS_CATALOG_CHANGES)
@@ -167,6 +173,16 @@ typedef struct ReorderBufferChange
* nentries_mem == nentries.
*/
#define rbtxn_is_serialized(txn) (txn->txn_flags & RBTXN_IS_SERIALIZED)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+/* was this txn committed in the meanwhile? */
+#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback(txn) (txn->txn_flags & RBTXN_ROLLBACK)
typedef struct ReorderBufferTXN
{
@@ -179,6 +195,8 @@ typedef struct ReorderBufferTXN
/* Do we know this is a subxact? Xid of top-level txn if so */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
/*
* LSN of the first data carrying, WAL record with knowledge about this
@@ -324,6 +342,37 @@ typedef void (*ReorderBufferCommitCB) (
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (
ReorderBuffer *rb,
@@ -369,6 +418,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferAbortPreparedCB abort_prepared;
ReorderBufferMessageCB message;
/*
@@ -416,6 +470,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapshot
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -439,6 +498,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
2.15.2 (Apple Git-101.1)
0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patchapplication/octet-stream; name=0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patchDownload
From e08e16fb46d5be7927c01de05daa1fd1ef9b19d7 Mon Sep 17 00:00:00 2001
From: Nikhil Sontakke <nikhils@2ndQuadrant.com>
Date: Thu, 26 Jul 2018 18:45:26 +0530
Subject: [PATCH 3/4] Gracefully handle concurrent aborts of uncommitted
transactions that are being decoded alongside.
When a transaction aborts, it's changes are considered unnecessary for
other transactions. That means the changes may be either cleaned up by
vacuum or removed from HOT chains (thus made inaccessible through
indexes), and there may be other such consequences.
When decoding committed transactions this is not an issue, and we
never decode transactions that abort before the decoding starts.
But for in-progress transactions - for example when decoding prepared
transactions on PREPARE (and not COMMIT PREPARED as before), this
may cause failures when the output plugin consults catalogs (both
system and user-defined).
We handle such failures by returning ERRCODE_TRANSACTION_ROLLBACK
sqlerrcode from system table scan APIs to the backend decoding a
specific uncommitted transaction. The decoding logic on the receipt
of such an sqlerrcode aborts the ongoing decoding and returns
gracefully.
---
src/backend/access/index/genam.c | 35 +++++++++++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 32 ++++++++++++++++++----
src/backend/utils/time/snapmgr.c | 25 ++++++++++++++++--
src/include/utils/snapmgr.h | 4 ++-
4 files changed, 88 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 9d08775687..9220dcce83 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -25,6 +25,7 @@
#include "lib/stringinfo.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/procarray.h"
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -423,6 +424,17 @@ systable_getnext(SysScanDesc sysscan)
else
htup = heap_getnext(sysscan->scan, ForwardScanDirection);
+ /*
+ * If CheckXidAlive is valid, then we check if it aborted. If it did, we
+ * error out
+ */
+ if (TransactionIdIsValid(CheckXidAlive) &&
+ !TransactionIdIsInProgress(CheckXidAlive) &&
+ !TransactionIdDidCommit(CheckXidAlive))
+ ereport(ERROR,
+ (errcode(ERRCODE_TRANSACTION_ROLLBACK),
+ errmsg("transaction aborted during system catalog scan")));
+
return htup;
}
@@ -476,6 +488,18 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
result = HeapTupleSatisfiesVisibility(tup, freshsnap, scan->rs_cbuf);
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
}
+
+ /*
+ * If CheckXidAlive is valid, then we check if it aborted. If it did, we
+ * error out
+ */
+ if (TransactionIdIsValid(CheckXidAlive) &&
+ !TransactionIdIsInProgress(CheckXidAlive) &&
+ !TransactionIdDidCommit(CheckXidAlive))
+ ereport(ERROR,
+ (errcode(ERRCODE_TRANSACTION_ROLLBACK),
+ errmsg("transaction aborted during system catalog scan")));
+
return result;
}
@@ -593,6 +617,17 @@ systable_getnext_ordered(SysScanDesc sysscan, ScanDirection direction)
if (htup && sysscan->iscan->xs_recheck)
elog(ERROR, "system catalog scans with lossy index conditions are not implemented");
+ /*
+ * If CheckXidAlive is valid, then we check if it aborted. If it did, we
+ * error out
+ */
+ if (TransactionIdIsValid(CheckXidAlive) &&
+ !TransactionIdIsInProgress(CheckXidAlive) &&
+ !TransactionIdDidCommit(CheckXidAlive))
+ ereport(ERROR,
+ (errcode(ERRCODE_TRANSACTION_ROLLBACK),
+ errmsg("transaction aborted during system catalog scan")));
+
return htup;
}
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2fffc90606..96d52d32c1 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -599,7 +599,7 @@ ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid,
txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
/* setup snapshot to allow catalog access */
- SetupHistoricSnapshot(snapshot_now, NULL);
+ SetupHistoricSnapshot(snapshot_now, NULL, xid);
PG_TRY();
{
rb->message(rb, txn, lsn, false, prefix, message_size, message);
@@ -1405,6 +1405,7 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
volatile CommandId command_id = FirstCommandId;
bool using_subtxn;
ReorderBufferIterTXNState *volatile iterstate = NULL;
+ MemoryContext ccxt = CurrentMemoryContext;
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
@@ -1431,7 +1432,7 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
ReorderBufferBuildTupleCidHash(rb, txn);
/* setup the initial snapshot */
- SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
+ SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash, xid);
/*
* Decoding needs access to syscaches et al., which in turn use
@@ -1672,7 +1673,7 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
/* and continue with the new one */
- SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
+ SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash, xid);
break;
case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
@@ -1692,7 +1693,7 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
snapshot_now->curcid = command_id;
TeardownHistoricSnapshot(false);
- SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
+ SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash, xid);
/*
* Every time the CommandId is incremented, we could
@@ -1777,6 +1778,20 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
PG_CATCH();
{
/* TODO: Encapsulate cleanup from the PG_TRY and PG_CATCH blocks */
+ MemoryContext ecxt = MemoryContextSwitchTo(ccxt);
+ ErrorData *errdata = CopyErrorData();
+
+ /*
+ * if the catalog scan access returned an error of
+ * rollback, then abort on the other side as well
+ */
+ if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ rb->abort(rb, txn, commit_lsn);
+ }
+
if (iterstate)
ReorderBufferIterTXNFinish(rb, iterstate);
@@ -1800,7 +1815,14 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
/* remove potential on-disk data, and deallocate */
ReorderBufferCleanupTXN(rb, txn);
- PG_RE_THROW();
+ /* re-throw only if it's not an abort */
+ if (errdata->sqlerrcode != ERRCODE_TRANSACTION_ROLLBACK)
+ {
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ }
+ else
+ FlushErrorState();
}
PG_END_TRY();
}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index edf59efc29..0354fc9da9 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -151,6 +151,13 @@ static Snapshot SecondarySnapshot = NULL;
static Snapshot CatalogSnapshot = NULL;
static Snapshot HistoricSnapshot = NULL;
+/*
+ * An xid value pointing to a possibly ongoing or a prepared transaction.
+ * Currently used in logical decoding. It's possible that such transactions
+ * can get aborted while the decoding is ongoing.
+ */
+TransactionId CheckXidAlive = InvalidTransactionId;
+
/*
* These are updated by GetSnapshotData. We initialize them this way
* for the convenience of TransactionIdIsInProgress: even in bootstrap
@@ -1995,10 +2002,14 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
* Setup a snapshot that replaces normal catalog snapshots that allows catalog
* access to behave just like it did at a certain point in the past.
*
+ * If a valid xid is passed in, we check if it is uncommitted and track it in
+ * CheckXidAlive. This is to re-check XID status while accessing catalog.
+ *
* Needed for logical decoding.
*/
void
-SetupHistoricSnapshot(Snapshot historic_snapshot, HTAB *tuplecids)
+SetupHistoricSnapshot(Snapshot historic_snapshot, HTAB *tuplecids,
+ TransactionId snapshot_xid)
{
Assert(historic_snapshot != NULL);
@@ -2007,8 +2018,17 @@ SetupHistoricSnapshot(Snapshot historic_snapshot, HTAB *tuplecids)
/* setup (cmin, cmax) lookup hash */
tuplecid_data = tuplecids;
-}
+ /*
+ * setup CheckXidAlive if it's not committed yet. We don't check
+ * if the xid aborted. That will happen during catalog access.
+ */
+ if (TransactionIdIsValid(snapshot_xid) &&
+ !TransactionIdDidCommit(snapshot_xid))
+ CheckXidAlive = snapshot_xid;
+ else
+ CheckXidAlive = InvalidTransactionId;
+}
/*
* Make catalog snapshots behave normally again.
@@ -2018,6 +2038,7 @@ TeardownHistoricSnapshot(bool is_error)
{
HistoricSnapshot = NULL;
tuplecid_data = NULL;
+ CheckXidAlive = InvalidTransactionId;
}
bool
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 83806f3040..bad2053477 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,10 @@ extern char *ExportSnapshot(Snapshot snapshot);
/* Support for catalog timetravel for logical decoding */
struct HTAB;
+extern TransactionId CheckXidAlive;
extern struct HTAB *HistoricSnapshotGetTupleCids(void);
-extern void SetupHistoricSnapshot(Snapshot snapshot_now, struct HTAB *tuplecids);
+extern void SetupHistoricSnapshot(Snapshot snapshot_now, struct HTAB *tuplecids,
+ TransactionId snapshot_xid);
extern void TeardownHistoricSnapshot(bool is_error);
extern bool HistoricSnapshotActive(void);
--
2.15.2 (Apple Git-101.1)
0004-Teach-test_decoding-plugin-to-work-with-2PC.patchapplication/octet-stream; name=0004-Teach-test_decoding-plugin-to-work-with-2PC.patchDownload
From 16cddbd9ca4b7a1615014257df4cb02b0a005416 Mon Sep 17 00:00:00 2001
From: Nikhil Sontakke <nikhils@2ndQuadrant.com>
Date: Wed, 13 Jun 2018 16:31:15 +0530
Subject: [PATCH 4/4] Teach test_decoding plugin to work with 2PC
Implement all callbacks required for decoding 2PC in this test_decoding
plugin. Includes relevant test cases as well.
Additionally, includes a new option "check-xid". If this option points
to a valid xid, then the pg_decode_change() API will wait for it to
be aborted externally. This allows us to test concurrent rollback of
a prepared transaction while it's being actually decoded simultaneously.
---
contrib/test_decoding/Makefile | 5 +-
contrib/test_decoding/expected/prepared.out | 185 ++++++++++++++++++++++++----
contrib/test_decoding/sql/prepared.sql | 77 ++++++++++--
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++++++++++
contrib/test_decoding/test_decoding.c | 179 +++++++++++++++++++++++++++
5 files changed, 532 insertions(+), 33 deletions(-)
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index afcab930f7..3f0b1c6ebd 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -26,7 +26,7 @@ installcheck:;
# installation, allow to do so, but only if requested explicitly.
installcheck-force: regresscheck-install-force isolationcheck-install-force
-check: regresscheck isolationcheck
+check: regresscheck isolationcheck 2pc-check
submake-regress:
$(MAKE) -C $(top_builddir)/src/test/regress all
@@ -67,3 +67,6 @@ isolationcheck-install-force: all | submake-isolation submake-test_decoding temp
isolationcheck isolationcheck-install-force
temp-install: EXTRA_INSTALL=contrib/test_decoding
+
+2pc-check: temp-install
+ $(prove_check)
diff --git a/contrib/test_decoding/expected/prepared.out b/contrib/test_decoding/expected/prepared.out
index 46e915d4ff..934c8f1509 100644
--- a/contrib/test_decoding/expected/prepared.out
+++ b/contrib/test_decoding/expected/prepared.out
@@ -6,19 +6,50 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
init
(1 row)
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ PREPARE TRANSACTION 'test_prepared#1'
+(3 rows)
+
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:2
+ COMMIT
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(6 rows)
+
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (4);
-- test prepared xact containing ddl
BEGIN;
@@ -26,45 +57,149 @@ INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
-INSERT INTO test_prepared2 VALUES (7);
-COMMIT PREPARED 'test_prepared#3';
--- make sure stuff still works
-INSERT INTO test_prepared1 VALUES (8);
-INSERT INTO test_prepared2 VALUES (9);
--- cleanup
-DROP TABLE test_prepared1;
-DROP TABLE test_prepared2;
--- show results
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
data
-------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- COMMIT
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:2
- COMMIT
BEGIN
table public.test_prepared1: INSERT: id[integer]:4
COMMIT
BEGIN
- table public.test_prepared2: INSERT: id[integer]:7
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:5
table public.test_prepared1: INSERT: id[integer]:6 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(7 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (8);
+INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:8 data[text]:null
COMMIT
BEGIN
table public.test_prepared2: INSERT: id[integer]:9
COMMIT
-(22 rows)
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+ relation | locktype | mode
+----------+----------+------
+(0 rows)
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:10 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:11 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:12
+ PREPARE TRANSACTION 'test_prepared_lock2'
+ COMMIT PREPARED 'test_prepared_lock2'
+(8 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
SELECT pg_drop_replication_slot('regression_slot');
pg_drop_replication_slot
diff --git a/contrib/test_decoding/sql/prepared.sql b/contrib/test_decoding/sql/prepared.sql
index e72639767e..60725419fe 100644
--- a/contrib/test_decoding/sql/prepared.sql
+++ b/contrib/test_decoding/sql/prepared.sql
@@ -2,21 +2,25 @@
SET synchronous_commit = on;
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (4);
@@ -27,24 +31,83 @@ ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- cleanup
DROP TABLE test_prepared1;
DROP TABLE test_prepared2;
--- show results
+-- show results. There should be nothing to show
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000000..99a9249689
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 1c439b57b0..b981d1693c 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -49,6 +54,8 @@ static void pg_output_begin(LogicalDecodingContext *ctx,
bool last_write);
static void pg_decode_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void pg_decode_abort_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr abort_lsn);
static void pg_decode_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, Relation rel,
ReorderBufferChange *change);
@@ -62,6 +69,18 @@ static void pg_decode_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr message_lsn,
bool transactional, const char *prefix,
Size sz, const char *message);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
void
_PG_init(void)
@@ -80,9 +99,14 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pg_decode_change;
cb->truncate_cb = pg_decode_truncate;
cb->commit_cb = pg_decode_commit_txn;
+ cb->abort_cb = pg_decode_abort_txn;
cb->filter_by_origin_cb = pg_decode_filter;
cb->shutdown_cb = pg_decode_shutdown;
cb->message_cb = pg_decode_message;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
}
@@ -102,11 +126,14 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid = InvalidTransactionId;
ctx->output_plugin_private = data;
opt->output_type = OUTPUT_PLUGIN_TEXTUAL_OUTPUT;
opt->receive_rewrites = false;
+ /* this plugin supports decoding of 2pc */
+ opt->enable_twophase = true;
foreach(option, ctx->output_plugin_options)
{
@@ -183,6 +210,32 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "check-xid") == 0)
+ {
+ if (elem->arg)
+ {
+ errno = 0;
+ data->check_xid = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (errno == EINVAL || errno == ERANGE)
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid is not a valid number: \"%s\"",
+ strVal(elem->arg))));
+ }
+ else
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid needs an input value")));
+
+ if (data->check_xid <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("Specify positive value for parameter \"%s\","
+ " you specified \"%s\"",
+ elem->defname, strVal(elem->arg))));
+ }
else
{
ereport(ERROR,
@@ -251,6 +304,116 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/* ABORT callback */
+static void
+pg_decode_abort_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "ABORT %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "ABORT");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -416,6 +579,22 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid is specified */
+ if (TransactionIdIsValid(data->check_xid))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid);
+ while (TransactionIdIsInProgress(data->check_xid))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid) &&
+ !TransactionIdDidCommit(data->check_xid))
+ elog(LOG, "%u aborted", data->check_xid);
+
+ Assert(TransactionIdDidAbort(data->check_xid));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
--
2.15.2 (Apple Git-101.1)
On 01/08/18 16:00, Nikhil Sontakke wrote:
I was wondering if anything else would be needed for user-defined
catalog tables..We don't need to do anything else for user-defined catalog tables
since they will also get accessed via the systable_* scan APIs.
They can be, but currently they might not be. So this requires at least
big fat warning in docs and description on how to access user catalogs
from plugins correctly (ie to always use systable_* API on them). It
would be nice if we could check for it in Assert builds at least.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018-08-01 21:55:18 +0200, Petr Jelinek wrote:
On 01/08/18 16:00, Nikhil Sontakke wrote:
I was wondering if anything else would be needed for user-defined
catalog tables..We don't need to do anything else for user-defined catalog tables
since they will also get accessed via the systable_* scan APIs.They can be, but currently they might not be. So this requires at least
big fat warning in docs and description on how to access user catalogs
from plugins correctly (ie to always use systable_* API on them). It
would be nice if we could check for it in Assert builds at least.
Yea, I agree. I think we should just consider putting similar checks in
the general scan APIs. With an unlikely() and the easy predictability of
these checks, I think we should be fine, overhead-wise.
Greetings,
Andres Freund
They can be, but currently they might not be. So this requires at least
big fat warning in docs and description on how to access user catalogs
from plugins correctly (ie to always use systable_* API on them). It
would be nice if we could check for it in Assert builds at least.
Ok, modified the sgml documentation for the above.
Yea, I agree. I think we should just consider putting similar checks in
the general scan APIs. With an unlikely() and the easy predictability of
these checks, I think we should be fine, overhead-wise.
Ok, added unlikely() checks in the heap_* scan APIs.
Revised patchset attached.
Regards,
Nikhils
Greetings,
Andres Freund
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patchapplication/octet-stream; name=0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patchDownload
From 7640800a2a1342a4ddeed4ffa049bf80aa99d4e1 Mon Sep 17 00:00:00 2001
From: Nikhil Sontakke <nikhils@2ndQuadrant.com>
Date: Thu, 26 Jul 2018 18:45:26 +0530
Subject: [PATCH 3/4] Gracefully handle concurrent aborts of uncommitted
transactions that are being decoded alongside.
When a transaction aborts, it's changes are considered unnecessary for
other transactions. That means the changes may be either cleaned up by
vacuum or removed from HOT chains (thus made inaccessible through
indexes), and there may be other such consequences.
When decoding committed transactions this is not an issue, and we
never decode transactions that abort before the decoding starts.
But for in-progress transactions - for example when decoding prepared
transactions on PREPARE (and not COMMIT PREPARED as before), this
may cause failures when the output plugin consults catalogs (both
system and user-defined).
We handle such failures by returning ERRCODE_TRANSACTION_ROLLBACK
sqlerrcode from system table scan APIs to the backend decoding a
specific uncommitted transaction. The decoding logic on the receipt
of such an sqlerrcode aborts the ongoing decoding and returns
gracefully.
---
doc/src/sgml/logicaldecoding.sgml | 5 ++-
src/backend/access/heap/heapam.c | 51 +++++++++++++++++++++++++
src/backend/access/index/genam.c | 35 +++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 32 +++++++++++++---
src/backend/utils/time/snapmgr.c | 25 +++++++++++-
src/include/utils/snapmgr.h | 4 +-
6 files changed, 143 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index a89e4d5184..d76afbda05 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -421,7 +421,10 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb);
ALTER TABLE user_catalog_table SET (user_catalog_table = true);
CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
</programlisting>
- Any actions leading to transaction ID assignment are prohibited. That, among others,
+ Note that access to user catalog tables or regular system catalog tables
+ in the output plugins has to be done via the <literal>systable_*</literal> scan APIs only.
+ Access via the <literal>heap_*</literal> scan APIs will error out.
+ Additionally, any actions leading to transaction ID assignment are prohibited. That, among others,
includes writing to tables, performing DDL changes, and
calling <literal>txid_current()</literal>.
</para>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 72395a50b8..ae9d24c164 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1834,6 +1834,17 @@ heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot)
HeapTuple
heap_getnext(HeapScanDesc scan, ScanDirection direction)
{
+ /*
+ * We don't expect direct calls to heap_getnext with valid
+ * CheckXidAlive for regular tables. Track that below.
+ */
+ if (unlikely(TransactionIdIsValid(CheckXidAlive) &&
+ !(IsCatalogRelation(scan->rs_rd) ||
+ RelationIsUsedAsCatalogTable(scan->rs_rd))))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+ errmsg("improper heap_getnext call")));
+
/* Note: no locking manipulations needed */
HEAPDEBUG_1; /* heap_getnext( info ) */
@@ -1914,6 +1925,16 @@ heap_fetch(Relation relation,
OffsetNumber offnum;
bool valid;
+ /*
+ * We don't expect direct calls to heap_fetch with valid
+ * CheckXidAlive for regular tables. Track that below.
+ */
+ if (unlikely(TransactionIdIsValid(CheckXidAlive) &&
+ !(IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+ errmsg("improper heap_fetch call")));
+
/*
* Fetch and pin the appropriate page of the relation.
*/
@@ -2046,6 +2067,16 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
bool valid;
bool skip;
+ /*
+ * We don't expect direct calls to heap_hot_search_buffer with
+ * valid CheckXidAlive for regular tables. Track that below.
+ */
+ if (unlikely(TransactionIdIsValid(CheckXidAlive) &&
+ !(IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+ errmsg("improper heap_hot_search_buffer call")));
+
/* If this is not the first call, previous call returned a (live!) tuple */
if (all_dead)
*all_dead = first_call;
@@ -2187,6 +2218,16 @@ heap_hot_search(ItemPointer tid, Relation relation, Snapshot snapshot,
Buffer buffer;
HeapTupleData heapTuple;
+ /*
+ * We don't expect direct calls to heap_hot_search with
+ * valid CheckXidAlive for regular tables. Track that below.
+ */
+ if (unlikely(TransactionIdIsValid(CheckXidAlive) &&
+ !(IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+ errmsg("improper heap_hot_search call")));
+
buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
LockBuffer(buffer, BUFFER_LOCK_SHARE);
result = heap_hot_search_buffer(tid, relation, buffer, snapshot,
@@ -2216,6 +2257,16 @@ heap_get_latest_tid(Relation relation,
ItemPointerData ctid;
TransactionId priorXmax;
+ /*
+ * We don't expect direct calls to heap_get_latest_tid with valid
+ * CheckXidAlive for regular tables. Track that below.
+ */
+ if (unlikely(TransactionIdIsValid(CheckXidAlive) &&
+ !(IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+ errmsg("improper heap_get_latest_tid call")));
+
/* this is to avoid Assert failures on bad input */
if (!ItemPointerIsValid(tid))
return;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 9d08775687..9220dcce83 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -25,6 +25,7 @@
#include "lib/stringinfo.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/procarray.h"
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -423,6 +424,17 @@ systable_getnext(SysScanDesc sysscan)
else
htup = heap_getnext(sysscan->scan, ForwardScanDirection);
+ /*
+ * If CheckXidAlive is valid, then we check if it aborted. If it did, we
+ * error out
+ */
+ if (TransactionIdIsValid(CheckXidAlive) &&
+ !TransactionIdIsInProgress(CheckXidAlive) &&
+ !TransactionIdDidCommit(CheckXidAlive))
+ ereport(ERROR,
+ (errcode(ERRCODE_TRANSACTION_ROLLBACK),
+ errmsg("transaction aborted during system catalog scan")));
+
return htup;
}
@@ -476,6 +488,18 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
result = HeapTupleSatisfiesVisibility(tup, freshsnap, scan->rs_cbuf);
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
}
+
+ /*
+ * If CheckXidAlive is valid, then we check if it aborted. If it did, we
+ * error out
+ */
+ if (TransactionIdIsValid(CheckXidAlive) &&
+ !TransactionIdIsInProgress(CheckXidAlive) &&
+ !TransactionIdDidCommit(CheckXidAlive))
+ ereport(ERROR,
+ (errcode(ERRCODE_TRANSACTION_ROLLBACK),
+ errmsg("transaction aborted during system catalog scan")));
+
return result;
}
@@ -593,6 +617,17 @@ systable_getnext_ordered(SysScanDesc sysscan, ScanDirection direction)
if (htup && sysscan->iscan->xs_recheck)
elog(ERROR, "system catalog scans with lossy index conditions are not implemented");
+ /*
+ * If CheckXidAlive is valid, then we check if it aborted. If it did, we
+ * error out
+ */
+ if (TransactionIdIsValid(CheckXidAlive) &&
+ !TransactionIdIsInProgress(CheckXidAlive) &&
+ !TransactionIdDidCommit(CheckXidAlive))
+ ereport(ERROR,
+ (errcode(ERRCODE_TRANSACTION_ROLLBACK),
+ errmsg("transaction aborted during system catalog scan")));
+
return htup;
}
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2fffc90606..96d52d32c1 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -599,7 +599,7 @@ ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid,
txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
/* setup snapshot to allow catalog access */
- SetupHistoricSnapshot(snapshot_now, NULL);
+ SetupHistoricSnapshot(snapshot_now, NULL, xid);
PG_TRY();
{
rb->message(rb, txn, lsn, false, prefix, message_size, message);
@@ -1405,6 +1405,7 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
volatile CommandId command_id = FirstCommandId;
bool using_subtxn;
ReorderBufferIterTXNState *volatile iterstate = NULL;
+ MemoryContext ccxt = CurrentMemoryContext;
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
@@ -1431,7 +1432,7 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
ReorderBufferBuildTupleCidHash(rb, txn);
/* setup the initial snapshot */
- SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
+ SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash, xid);
/*
* Decoding needs access to syscaches et al., which in turn use
@@ -1672,7 +1673,7 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
/* and continue with the new one */
- SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
+ SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash, xid);
break;
case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
@@ -1692,7 +1693,7 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
snapshot_now->curcid = command_id;
TeardownHistoricSnapshot(false);
- SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
+ SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash, xid);
/*
* Every time the CommandId is incremented, we could
@@ -1777,6 +1778,20 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
PG_CATCH();
{
/* TODO: Encapsulate cleanup from the PG_TRY and PG_CATCH blocks */
+ MemoryContext ecxt = MemoryContextSwitchTo(ccxt);
+ ErrorData *errdata = CopyErrorData();
+
+ /*
+ * if the catalog scan access returned an error of
+ * rollback, then abort on the other side as well
+ */
+ if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ rb->abort(rb, txn, commit_lsn);
+ }
+
if (iterstate)
ReorderBufferIterTXNFinish(rb, iterstate);
@@ -1800,7 +1815,14 @@ ReorderBufferCommitInternal(ReorderBufferTXN *txn,
/* remove potential on-disk data, and deallocate */
ReorderBufferCleanupTXN(rb, txn);
- PG_RE_THROW();
+ /* re-throw only if it's not an abort */
+ if (errdata->sqlerrcode != ERRCODE_TRANSACTION_ROLLBACK)
+ {
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ }
+ else
+ FlushErrorState();
}
PG_END_TRY();
}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index edf59efc29..0354fc9da9 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -151,6 +151,13 @@ static Snapshot SecondarySnapshot = NULL;
static Snapshot CatalogSnapshot = NULL;
static Snapshot HistoricSnapshot = NULL;
+/*
+ * An xid value pointing to a possibly ongoing or a prepared transaction.
+ * Currently used in logical decoding. It's possible that such transactions
+ * can get aborted while the decoding is ongoing.
+ */
+TransactionId CheckXidAlive = InvalidTransactionId;
+
/*
* These are updated by GetSnapshotData. We initialize them this way
* for the convenience of TransactionIdIsInProgress: even in bootstrap
@@ -1995,10 +2002,14 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
* Setup a snapshot that replaces normal catalog snapshots that allows catalog
* access to behave just like it did at a certain point in the past.
*
+ * If a valid xid is passed in, we check if it is uncommitted and track it in
+ * CheckXidAlive. This is to re-check XID status while accessing catalog.
+ *
* Needed for logical decoding.
*/
void
-SetupHistoricSnapshot(Snapshot historic_snapshot, HTAB *tuplecids)
+SetupHistoricSnapshot(Snapshot historic_snapshot, HTAB *tuplecids,
+ TransactionId snapshot_xid)
{
Assert(historic_snapshot != NULL);
@@ -2007,8 +2018,17 @@ SetupHistoricSnapshot(Snapshot historic_snapshot, HTAB *tuplecids)
/* setup (cmin, cmax) lookup hash */
tuplecid_data = tuplecids;
-}
+ /*
+ * setup CheckXidAlive if it's not committed yet. We don't check
+ * if the xid aborted. That will happen during catalog access.
+ */
+ if (TransactionIdIsValid(snapshot_xid) &&
+ !TransactionIdDidCommit(snapshot_xid))
+ CheckXidAlive = snapshot_xid;
+ else
+ CheckXidAlive = InvalidTransactionId;
+}
/*
* Make catalog snapshots behave normally again.
@@ -2018,6 +2038,7 @@ TeardownHistoricSnapshot(bool is_error)
{
HistoricSnapshot = NULL;
tuplecid_data = NULL;
+ CheckXidAlive = InvalidTransactionId;
}
bool
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 83806f3040..bad2053477 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,10 @@ extern char *ExportSnapshot(Snapshot snapshot);
/* Support for catalog timetravel for logical decoding */
struct HTAB;
+extern TransactionId CheckXidAlive;
extern struct HTAB *HistoricSnapshotGetTupleCids(void);
-extern void SetupHistoricSnapshot(Snapshot snapshot_now, struct HTAB *tuplecids);
+extern void SetupHistoricSnapshot(Snapshot snapshot_now, struct HTAB *tuplecids,
+ TransactionId snapshot_xid);
extern void TeardownHistoricSnapshot(bool is_error);
extern bool HistoricSnapshotActive(void);
--
2.15.2 (Apple Git-101.1)
0004-Teach-test_decoding-plugin-to-work-with-2PC.patchapplication/octet-stream; name=0004-Teach-test_decoding-plugin-to-work-with-2PC.patchDownload
From 6940c92d8c1b0f517c0e785fb62253fc0390044d Mon Sep 17 00:00:00 2001
From: Nikhil Sontakke <nikhils@2ndQuadrant.com>
Date: Wed, 13 Jun 2018 16:31:15 +0530
Subject: [PATCH 4/4] Teach test_decoding plugin to work with 2PC
Implement all callbacks required for decoding 2PC in this test_decoding
plugin. Includes relevant test cases as well.
Additionally, includes a new option "check-xid". If this option points
to a valid xid, then the pg_decode_change() API will wait for it to
be aborted externally. This allows us to test concurrent rollback of
a prepared transaction while it's being actually decoded simultaneously.
---
contrib/test_decoding/Makefile | 5 +-
contrib/test_decoding/expected/prepared.out | 185 ++++++++++++++++++++++++----
contrib/test_decoding/sql/prepared.sql | 77 ++++++++++--
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++++++++++
contrib/test_decoding/test_decoding.c | 179 +++++++++++++++++++++++++++
5 files changed, 532 insertions(+), 33 deletions(-)
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index afcab930f7..3f0b1c6ebd 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -26,7 +26,7 @@ installcheck:;
# installation, allow to do so, but only if requested explicitly.
installcheck-force: regresscheck-install-force isolationcheck-install-force
-check: regresscheck isolationcheck
+check: regresscheck isolationcheck 2pc-check
submake-regress:
$(MAKE) -C $(top_builddir)/src/test/regress all
@@ -67,3 +67,6 @@ isolationcheck-install-force: all | submake-isolation submake-test_decoding temp
isolationcheck isolationcheck-install-force
temp-install: EXTRA_INSTALL=contrib/test_decoding
+
+2pc-check: temp-install
+ $(prove_check)
diff --git a/contrib/test_decoding/expected/prepared.out b/contrib/test_decoding/expected/prepared.out
index 46e915d4ff..934c8f1509 100644
--- a/contrib/test_decoding/expected/prepared.out
+++ b/contrib/test_decoding/expected/prepared.out
@@ -6,19 +6,50 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
init
(1 row)
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ PREPARE TRANSACTION 'test_prepared#1'
+(3 rows)
+
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:2
+ COMMIT
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(6 rows)
+
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (4);
-- test prepared xact containing ddl
BEGIN;
@@ -26,45 +57,149 @@ INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
-INSERT INTO test_prepared2 VALUES (7);
-COMMIT PREPARED 'test_prepared#3';
--- make sure stuff still works
-INSERT INTO test_prepared1 VALUES (8);
-INSERT INTO test_prepared2 VALUES (9);
--- cleanup
-DROP TABLE test_prepared1;
-DROP TABLE test_prepared2;
--- show results
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
data
-------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- COMMIT
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:2
- COMMIT
BEGIN
table public.test_prepared1: INSERT: id[integer]:4
COMMIT
BEGIN
- table public.test_prepared2: INSERT: id[integer]:7
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:5
table public.test_prepared1: INSERT: id[integer]:6 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(7 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (8);
+INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:8 data[text]:null
COMMIT
BEGIN
table public.test_prepared2: INSERT: id[integer]:9
COMMIT
-(22 rows)
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+ relation | locktype | mode
+----------+----------+------
+(0 rows)
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:10 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:11 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:12
+ PREPARE TRANSACTION 'test_prepared_lock2'
+ COMMIT PREPARED 'test_prepared_lock2'
+(8 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
SELECT pg_drop_replication_slot('regression_slot');
pg_drop_replication_slot
diff --git a/contrib/test_decoding/sql/prepared.sql b/contrib/test_decoding/sql/prepared.sql
index e72639767e..60725419fe 100644
--- a/contrib/test_decoding/sql/prepared.sql
+++ b/contrib/test_decoding/sql/prepared.sql
@@ -2,21 +2,25 @@
SET synchronous_commit = on;
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (4);
@@ -27,24 +31,83 @@ ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- cleanup
DROP TABLE test_prepared1;
DROP TABLE test_prepared2;
--- show results
+-- show results. There should be nothing to show
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000000..99a9249689
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 1c439b57b0..b981d1693c 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -49,6 +54,8 @@ static void pg_output_begin(LogicalDecodingContext *ctx,
bool last_write);
static void pg_decode_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void pg_decode_abort_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr abort_lsn);
static void pg_decode_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, Relation rel,
ReorderBufferChange *change);
@@ -62,6 +69,18 @@ static void pg_decode_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr message_lsn,
bool transactional, const char *prefix,
Size sz, const char *message);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
void
_PG_init(void)
@@ -80,9 +99,14 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pg_decode_change;
cb->truncate_cb = pg_decode_truncate;
cb->commit_cb = pg_decode_commit_txn;
+ cb->abort_cb = pg_decode_abort_txn;
cb->filter_by_origin_cb = pg_decode_filter;
cb->shutdown_cb = pg_decode_shutdown;
cb->message_cb = pg_decode_message;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
}
@@ -102,11 +126,14 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid = InvalidTransactionId;
ctx->output_plugin_private = data;
opt->output_type = OUTPUT_PLUGIN_TEXTUAL_OUTPUT;
opt->receive_rewrites = false;
+ /* this plugin supports decoding of 2pc */
+ opt->enable_twophase = true;
foreach(option, ctx->output_plugin_options)
{
@@ -183,6 +210,32 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "check-xid") == 0)
+ {
+ if (elem->arg)
+ {
+ errno = 0;
+ data->check_xid = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (errno == EINVAL || errno == ERANGE)
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid is not a valid number: \"%s\"",
+ strVal(elem->arg))));
+ }
+ else
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid needs an input value")));
+
+ if (data->check_xid <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("Specify positive value for parameter \"%s\","
+ " you specified \"%s\"",
+ elem->defname, strVal(elem->arg))));
+ }
else
{
ereport(ERROR,
@@ -251,6 +304,116 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/* ABORT callback */
+static void
+pg_decode_abort_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "ABORT %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "ABORT");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -416,6 +579,22 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid is specified */
+ if (TransactionIdIsValid(data->check_xid))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid);
+ while (TransactionIdIsInProgress(data->check_xid))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid) &&
+ !TransactionIdDidCommit(data->check_xid))
+ elog(LOG, "%u aborted", data->check_xid);
+
+ Assert(TransactionIdDidAbort(data->check_xid));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
--
2.15.2 (Apple Git-101.1)
0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patchapplication/octet-stream; name=0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patchDownload
From 8e32c723490df640ffd071716930715ba087814c Mon Sep 17 00:00:00 2001
From: Nikhil Sontakke <nikhils@2ndQuadrant.com>
Date: Wed, 13 Jun 2018 16:15:24 +0530
Subject: [PATCH 1/4] Cleaning up of flags in ReorderBufferTXN structure
---
src/backend/replication/logical/reorderbuffer.c | 34 ++++++++++++-------------
src/include/replication/reorderbuffer.h | 33 ++++++++++++++----------
2 files changed, 37 insertions(+), 30 deletions(-)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 9b55b94227..fb71631434 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -643,7 +643,7 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
Assert(prev_first_lsn < cur_txn->first_lsn);
/* known-as-subtxn txns must not be listed */
- Assert(!cur_txn->is_known_as_subxact);
+ Assert(!rbtxn_is_known_subxact(cur_txn));
prev_first_lsn = cur_txn->first_lsn;
}
@@ -663,7 +663,7 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
Assert(prev_base_snap_lsn < cur_txn->base_snapshot_lsn);
/* known-as-subtxn txns must not be listed */
- Assert(!cur_txn->is_known_as_subxact);
+ Assert(!rbtxn_is_known_subxact(cur_txn));
prev_base_snap_lsn = cur_txn->base_snapshot_lsn;
}
@@ -686,7 +686,7 @@ ReorderBufferGetOldestTXN(ReorderBuffer *rb)
txn = dlist_head_element(ReorderBufferTXN, node, &rb->toplevel_by_lsn);
- Assert(!txn->is_known_as_subxact);
+ Assert(!rbtxn_is_known_subxact(txn));
Assert(txn->first_lsn != InvalidXLogRecPtr);
return txn;
}
@@ -746,7 +746,7 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
if (!new_sub)
{
- if (subtxn->is_known_as_subxact)
+ if (rbtxn_is_known_subxact(subtxn))
{
/* already associated, nothing to do */
return;
@@ -762,7 +762,7 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
}
}
- subtxn->is_known_as_subxact = true;
+ subtxn->txn_flags |= RBTXN_IS_SUBXACT;
subtxn->toplevel_xid = xid;
Assert(subtxn->nsubtxns == 0);
@@ -972,7 +972,7 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn)
{
ReorderBufferChange *cur_change;
- if (txn->serialized)
+ if (rbtxn_is_serialized(txn))
{
/* serialize remaining changes */
ReorderBufferSerializeTXN(rb, txn);
@@ -1001,7 +1001,7 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn)
{
ReorderBufferChange *cur_change;
- if (cur_txn->serialized)
+ if (rbtxn_is_serialized(cur_txn))
{
/* serialize remaining changes */
ReorderBufferSerializeTXN(rb, cur_txn);
@@ -1167,7 +1167,7 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* they originally were happening inside another subtxn, so we won't
* ever recurse more than one level deep here.
*/
- Assert(subtxn->is_known_as_subxact);
+ Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
ReorderBufferCleanupTXN(rb, subtxn);
@@ -1208,7 +1208,7 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
/*
* Remove TXN from its containing list.
*
- * Note: if txn->is_known_as_subxact, we are deleting the TXN from its
+ * Note: if txn is known as subxact, we are deleting the TXN from its
* parent's list of known subxacts; this leaves the parent's nsubxacts
* count too high, but we don't care. Otherwise, we are deleting the TXN
* from the LSN-ordered list of toplevel TXNs.
@@ -1223,7 +1223,7 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(found);
/* remove entries spilled to disk */
- if (txn->serialized)
+ if (rbtxn_is_serialized(txn))
ReorderBufferRestoreCleanup(rb, txn);
/* deallocate */
@@ -1240,7 +1240,7 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
dlist_iter iter;
HASHCTL hash_ctl;
- if (!txn->has_catalog_changes || dlist_is_empty(&txn->tuplecids))
+ if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids))
return;
memset(&hash_ctl, 0, sizeof(hash_ctl));
@@ -1854,7 +1854,7 @@ ReorderBufferAbortOld(ReorderBuffer *rb, TransactionId oldestRunningXid)
* final_lsn to that of their last change; this causes
* ReorderBufferRestoreCleanup to do the right thing.
*/
- if (txn->serialized && txn->final_lsn == 0)
+ if (rbtxn_is_serialized(txn) && txn->final_lsn == 0)
{
ReorderBufferChange *last =
dlist_tail_element(ReorderBufferChange, node, &txn->changes);
@@ -2002,7 +2002,7 @@ ReorderBufferSetBaseSnapshot(ReorderBuffer *rb, TransactionId xid,
* operate on its top-level transaction instead.
*/
txn = ReorderBufferTXNByXid(rb, xid, true, &is_new, lsn, true);
- if (txn->is_known_as_subxact)
+ if (rbtxn_is_known_subxact(txn))
txn = ReorderBufferTXNByXid(rb, txn->toplevel_xid, false,
NULL, InvalidXLogRecPtr, false);
Assert(txn->base_snapshot == NULL);
@@ -2109,7 +2109,7 @@ ReorderBufferXidSetCatalogChanges(ReorderBuffer *rb, TransactionId xid,
txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
- txn->has_catalog_changes = true;
+ txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
}
/*
@@ -2126,7 +2126,7 @@ ReorderBufferXidHasCatalogChanges(ReorderBuffer *rb, TransactionId xid)
if (txn == NULL)
return false;
- return txn->has_catalog_changes;
+ return rbtxn_has_catalog_changes(txn);
}
/*
@@ -2146,7 +2146,7 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
return false;
/* a known subtxn? operate on top-level txn instead */
- if (txn->is_known_as_subxact)
+ if (rbtxn_is_known_subxact(txn))
txn = ReorderBufferTXNByXid(rb, txn->toplevel_xid, false,
NULL, InvalidXLogRecPtr, false);
@@ -2267,7 +2267,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(spilled == txn->nentries_mem);
Assert(dlist_is_empty(&txn->changes));
txn->nentries_mem = 0;
- txn->serialized = true;
+ txn->txn_flags |= RBTXN_IS_SERIALIZED;
if (fd != -1)
CloseTransientFile(fd);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1f52f6bde7..ec9515d156 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -150,18 +150,34 @@ typedef struct ReorderBufferChange
dlist_node node;
} ReorderBufferChange;
+/* ReorderBufferTXN flags */
+#define RBTXN_HAS_CATALOG_CHANGES 0x0001
+#define RBTXN_IS_SUBXACT 0x0002
+#define RBTXN_IS_SERIALIZED 0x0004
+
+/* does the txn have catalog changes */
+#define rbtxn_has_catalog_changes(txn) (txn->txn_flags & RBTXN_HAS_CATALOG_CHANGES)
+/* is the txn known as a subxact? */
+#define rbtxn_is_known_subxact(txn) (txn->txn_flags & RBTXN_IS_SUBXACT)
+/*
+ * Has this transaction been spilled to disk? It's not always possible to
+ * deduce that fact by comparing nentries with nentries_mem, because e.g.
+ * subtransactions of a large transaction might get serialized together
+ * with the parent - if they're restored to memory they'd have
+ * nentries_mem == nentries.
+ */
+#define rbtxn_is_serialized(txn) (txn->txn_flags & RBTXN_IS_SERIALIZED)
+
typedef struct ReorderBufferTXN
{
+ int txn_flags;
+
/*
* The transactions transaction id, can be a toplevel or sub xid.
*/
TransactionId xid;
- /* did the TX have catalog changes */
- bool has_catalog_changes;
-
/* Do we know this is a subxact? Xid of top-level txn if so */
- bool is_known_as_subxact;
TransactionId toplevel_xid;
/*
@@ -229,15 +245,6 @@ typedef struct ReorderBufferTXN
*/
uint64 nentries_mem;
- /*
- * Has this transaction been spilled to disk? It's not always possible to
- * deduce that fact by comparing nentries with nentries_mem, because e.g.
- * subtransactions of a large transaction might get serialized together
- * with the parent - if they're restored to memory they'd have
- * nentries_mem == nentries.
- */
- bool serialized;
-
/*
* List of ReorderBufferChange structs, including new Snapshots and new
* CommandIds
--
2.15.2 (Apple Git-101.1)
0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patchapplication/octet-stream; name=0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patchDownload
From 83b83aed7769a4971c83823f50a3799cf74bd8d5 Mon Sep 17 00:00:00 2001
From: Nikhil Sontakke <nikhils@2ndQuadrant.com>
Date: Wed, 13 Jun 2018 16:30:30 +0530
Subject: [PATCH 2/4] Support decoding of two-phase transactions at PREPARE
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
On the subscriber, the transactions will be executed as two-phase
transactions, with the same GID. This is important for various
external transaction managers, that often encode information into
the GID itself.
Includes documentation changes.
---
doc/src/sgml/logicaldecoding.sgml | 127 ++++++++++++++-
src/backend/replication/logical/decode.c | 147 +++++++++++++++--
src/backend/replication/logical/logical.c | 203 ++++++++++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 185 +++++++++++++++++++--
src/include/replication/logical.h | 2 +-
src/include/replication/output_plugin.h | 46 ++++++
src/include/replication/reorderbuffer.h | 68 ++++++++
7 files changed, 746 insertions(+), 32 deletions(-)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 8db968641e..a89e4d5184 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -385,7 +385,12 @@ typedef struct OutputPluginCallbacks
LogicalDecodeChangeCB change_cb;
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeAbortCB abort_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
} OutputPluginCallbacks;
@@ -457,7 +462,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -558,6 +569,71 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Commit Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+ <title>Rollback Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>abort_prepared_cb</function> callback is called whenever
+ a rollback prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort">
+ <title>Transaction Abort Callback</title>
+
+ <para>
+ The required <function>abort_cb</function> callback is called whenever
+ a transaction abort has to be initiated. This can happen if we are
+ decoding a transaction that has been prepared for two-phase commit and
+ a concurrent rollback happens while we are decoding it.
+<programlisting>
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -567,7 +643,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -644,6 +726,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -665,7 +780,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 59c003de9c..99d801eb94 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -34,6 +34,7 @@
#include "access/xlogutils.h"
#include "access/xlogreader.h"
#include "access/xlogrecord.h"
+#include "access/twophase.h"
#include "catalog/pg_control.h"
@@ -73,6 +74,8 @@ static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -281,16 +284,33 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
break;
}
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->options.enable_twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
+
+ /* ok, parse it */
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ XLogRecGetData(buf->record), &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
break;
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
@@ -633,9 +653,90 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
buf->origptr, buf->endptr);
}
+ /*
+ * Decide if we're processing COMMIT PREPARED, or a regular COMMIT.
+ * Regular commit simply triggers a replay of transaction changes from the
+ * reorder buffer. For COMMIT PREPARED that however already happened at
+ * PREPARE time, and so we only need to notify the subscriber that the GID
+ * finally committed.
+ *
+ * For output plugins that do not support PREPARE-time decoding of
+ * two-phase transactions, we never even see the PREPARE and all two-phase
+ * transactions simply fall through to the second branch.
+ */
+ if (TransactionIdIsValid(parsed->twophase_xid) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder,
+ parsed->twophase_xid, parsed->twophase_gid))
+ {
+ Assert(xid == parsed->twophase_xid);
+ /* we are processing COMMIT PREPARED */
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+ else
+ {
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ /*
+ * Process invalidation messages, even if we're not interested in the
+ * transaction's contents, since the various caches need to always be
+ * consistent.
+ */
+ if (parsed->nmsgs > 0)
+ {
+ if (!ctx->fast_forward)
+ ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
+ parsed->nmsgs, parsed->msgs);
+ ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+ }
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ {
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
+ }
+ ReorderBufferForget(ctx->reorder, xid, buf->origptr);
+
+ return;
+ }
+
/* replay actions of all transaction + subtransactions in order */
- ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
- commit_time, origin_id, origin_lsn);
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
}
/*
@@ -647,6 +748,30 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid)
{
int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
for (i = 0; i < parsed->nsubxacts; i++)
{
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 3cd4eefb9b..f3b28c3dab 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -60,6 +60,16 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static void abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -187,6 +197,11 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->abort = abort_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->abort_prepared = abort_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
ctx->out = makeStringInfo();
@@ -611,6 +626,33 @@ startup_cb_wrapper(LogicalDecodingContext *ctx, OutputPluginOptions *opt, bool i
/* do the actual work: call callback */
ctx->callbacks.startup_cb(ctx, opt, is_init);
+ /*
+ * If the plugin claims to support two-phase transactions, then
+ * check that the plugin implements all callbacks necessary to decode
+ * two-phase transactions - we either have to have all of them or none.
+ * The filter_prepare callback is optional, but can only be defined when
+ * two-phase decoding is enabled (i.e. the three other callbacks are
+ * defined).
+ */
+ if (opt->enable_twophase)
+ {
+ int twophase_callbacks = (ctx->callbacks.prepare_cb != NULL) +
+ (ctx->callbacks.commit_prepared_cb != NULL) +
+ (ctx->callbacks.abort_prepared_cb != NULL);
+
+ /* Plugins with incorrect number of two-phase callbacks are broken. */
+ if ((twophase_callbacks != 3) && (twophase_callbacks != 0))
+ ereport(ERROR,
+ (errmsg("Output plugin registered only %d twophase callbacks. ",
+ twophase_callbacks)));
+ }
+
+ /* filter_prepare is optional, but requires two-phase decoding */
+ if ((ctx->callbacks.filter_prepare_cb != NULL) && (!opt->enable_twophase))
+ ereport(ERROR,
+ (errmsg("Output plugin does not support two-phase decoding, but "
+ "registered filter_prepared callback.")));
+
/* Pop the error context stack */
error_context_stack = errcallback.previous;
}
@@ -708,6 +750,122 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static void
+abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort";
+ state.report_location = txn->final_lsn; /* beginning of abort record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
@@ -785,6 +943,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of twophase at PREPARE time is not enabled. In that
+ * case all twophase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->options.enable_twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index fb71631434..2fffc90606 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -337,6 +337,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
if (txn->tuplecid_hash != NULL)
{
@@ -1389,25 +1394,18 @@ ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap)
* and subtransactions (using a k-way merge) and replay the changes in lsn
* order.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
volatile Snapshot snapshot_now;
volatile CommandId command_id = FirstCommandId;
bool using_subtxn;
ReorderBufferIterTXNState *volatile iterstate = NULL;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -1711,7 +1709,6 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
break;
}
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
@@ -1727,8 +1724,22 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
ReorderBufferIterTXNFinish(rb, iterstate);
iterstate = NULL;
- /* call commit callback */
- rb->commit(rb, txn, commit_lsn);
+ /*
+ * Call abort/commit/prepare callback, depending on the transaction
+ * state.
+ *
+ * If the transaction aborted during apply (which currently can happen
+ * only for prepared transactions), simply call the abort callback.
+ *
+ * Otherwise call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
+ else if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -1755,7 +1766,12 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
if (snapshot_now->copied)
ReorderBufferFreeSnap(rb, snapshot_now);
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
ReorderBufferCleanupTXN(rb, txn);
}
PG_CATCH();
@@ -1789,6 +1805,141 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
PG_END_TRY();
}
+
+/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a twophase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ABORT PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->abort_prepared(rb, txn, commit_lsn);
+ }
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index c25ac1fa85..5fdda65031 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -47,7 +47,7 @@ typedef struct LogicalDecodingContext
/*
* Marks the logical decoding context as fast forward decoding one. Such a
- * context does not have plugin loaded so most of the the following
+ * context does not have plugin loaded so most of the following
* properties are unused.
*/
bool fast_forward;
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 1ee0a56f03..c9140e7001 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -27,6 +27,7 @@ typedef struct OutputPluginOptions
{
OutputPluginOutputType output_type;
bool receive_rewrites;
+ bool enable_twophase;
} OutputPluginOptions;
/*
@@ -77,6 +78,46 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/*
+ * Called for an implicit ABORT of a transaction.
+ */
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/abort_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -109,7 +150,12 @@ typedef struct OutputPluginCallbacks
LogicalDecodeChangeCB change_cb;
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeAbortCB abort_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
} OutputPluginCallbacks;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index ec9515d156..285c9b53da 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -154,6 +155,11 @@ typedef struct ReorderBufferChange
#define RBTXN_HAS_CATALOG_CHANGES 0x0001
#define RBTXN_IS_SUBXACT 0x0002
#define RBTXN_IS_SERIALIZED 0x0004
+#define RBTXN_PREPARE 0x0008
+#define RBTXN_COMMIT_PREPARED 0x0010
+#define RBTXN_ROLLBACK_PREPARED 0x0020
+#define RBTXN_COMMIT 0x0040
+#define RBTXN_ROLLBACK 0x0080
/* does the txn have catalog changes */
#define rbtxn_has_catalog_changes(txn) (txn->txn_flags & RBTXN_HAS_CATALOG_CHANGES)
@@ -167,6 +173,16 @@ typedef struct ReorderBufferChange
* nentries_mem == nentries.
*/
#define rbtxn_is_serialized(txn) (txn->txn_flags & RBTXN_IS_SERIALIZED)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+/* was this txn committed in the meanwhile? */
+#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback(txn) (txn->txn_flags & RBTXN_ROLLBACK)
typedef struct ReorderBufferTXN
{
@@ -179,6 +195,8 @@ typedef struct ReorderBufferTXN
/* Do we know this is a subxact? Xid of top-level txn if so */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
/*
* LSN of the first data carrying, WAL record with knowledge about this
@@ -324,6 +342,37 @@ typedef void (*ReorderBufferCommitCB) (
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (
ReorderBuffer *rb,
@@ -369,6 +418,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferAbortPreparedCB abort_prepared;
ReorderBufferMessageCB message;
/*
@@ -416,6 +470,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapshot
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -439,6 +498,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
2.15.2 (Apple Git-101.1)
Hello,
I have looked through the patches. I will first describe relativaly
serious issues I see and then proceed with small nitpicking.
- On decoding of aborted xacts. The idea to throw an error once we
detect the abort is appealing, however I think you will have problems
with subxacts in the current implementation. What if subxact issues
DDL and then aborted, but main transaction successfully committed?
- Decoding transactions at PREPARE record changes rules of the "we ship
all commits after lsn 'x'" game. Namely, it will break initial
tablesync: what if consistent snapshot was formed *after* PREPARE, but
before COMMIT PREPARED, and the plugin decides to employ 2pc? Instead
of getting inital contents + continious stream of changes the receiver
will miss the prepared xact contents and raise 'prepared xact doesn't
exist' error. I think the starting point to address this is to forbid
two-phase decoding of xacts with lsn of PREPARE less than
snapbuilder's start_decoding_at.
- Currently we will call abort_prepared cb even if we failed to actually
prepare xact due to concurrent abort. I think it is confusing for
users. We should either handle this by remembering not to invoke
abort_prepared in these cases or at least document this behaviour,
leaving this problem to the receiver side.
- I find it suspicious that DecodePrepare completely ignores actions of
SnapBuildCommitTxn. For example, to execute invalidations, the latter
sets base snapshot if our xact (or subxacts) did DDL and the snapshot
not set yet. My fantasy doesn't hint me the concrete example
where this would burn at the moment, but it should be considered.
Now, the bikeshedding.
First patch:
- I am one of those people upthread who don't think that converting
flags to bitmask is beneficial -- especially given that many of them
are mutually exclusive, e.g. xact can't be committed and aborted at
the same time. Apparently you have left this to the committer though.
Second patch:
- Applying gives me
Applying: Support decoding of two-phase transactions at PREPARE
.git/rebase-apply/patch:871: trailing whitespace.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback is ensured sane access to catalog tables regardless of
+ simultaneous rollback by another backend of this very same transaction.
I don't think we should explain this, at least in such words. As
mentioned upthread, we should warn about allowed systable_* accesses
instead. Same for message_cb.
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
While we do certainly need to associate subxacts here, the explanation
looks weird to me. I would leave just the 'Tell the reorderbuffer about
the surviving subtransactions' as in DecodeCommit.
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
Spurious newline deletion.
- I would rename ReorderBufferCommitInternal to ReorderBufferReplay:
we replay the xact there, not commit.
- If xact is empty, we will not prepare it (and call cb),
even if the output plugin asked us. However, we will call
commit_prepared cb.
- ReorderBufferTxnIsPrepared and ReorderBufferPrepareNeedSkip do the
same and should be merged with comments explaining that the answer
must be stable.
- filter_prepare_cb callback existence is checked in both decode.c and
in filter_prepare_cb_wrapper.
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
Code around looks sane, but I think that ReorderBufferTXN for our xact
must *not* exist at this moment: if we are going to COMMIT/ABORT
PREPARED it, it must have been replayed and RBTXN purged immediately
after. Also, instead of misty '2PC transactions do not contain any
reorderbuffers' I would say something like 'create dummy
ReorderBufferTXN to pass it to the callback'.
- In DecodeAbort:
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+
How xid can be invalid here?
- It might be worthwile to put the check
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
which appears 3 times now into separate function.
+ * two-phase transactions - we either have to have all of them or none.
+ * The filter_prepare callback is optional, but can only be defined when
Kind of controversial (all of them or none, but optional), might be
formulated more accurately.
+ /*
+ * Capabilities of the output plugin.
+ */
+ bool enable_twophase;
I would rename this to 'supports_twophase' since this is not an option
but a description of the plugin capabilities.
+ /* filter_prepare is optional, but requires two-phase decoding */
+ if ((ctx->callbacks.filter_prepare_cb != NULL) && (!ctx->enable_twophase))
+ ereport(ERROR,
+ (errmsg("Output plugin does not support two-phase decoding, but "
+ "registered
filter_prepared callback.")));
Don't think we need to check that...
+ * Otherwise call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
This is the dead code since we don't have decoding of in-progress xacts
yet.
Third patch:
+/*
+ * An xid value pointing to a possibly ongoing or a prepared transaction.
+ * Currently used in logical decoding. It's possible that such transactions
+ * can get aborted while the decoding is ongoing.
+ */
I would explain here that this xid is checked for abort after each
catalog scan, and sent for the details to SetupHistoricSnapshot.
+ /*
+ * If CheckXidAlive is valid, then we check if it aborted. If it did, we
+ * error out
+ */
+ if (TransactionIdIsValid(CheckXidAlive) &&
+ !TransactionIdIsInProgress(CheckXidAlive) &&
+ !TransactionIdDidCommit(CheckXidAlive))
+ ereport(ERROR,
+ (errcode(ERRCODE_TRANSACTION_ROLLBACK),
+ errmsg("transaction aborted during system catalog scan")));
Probably centralize checks in one function? As well as 'We don't expect
direct calls to heap_fetch...' ones.
P.S. Looks like you have torn the thread chain: In-Reply-To header of
mail [1]/messages/by-id/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com is missing. Please don't do that.
[1]: /messages/by-id/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2018-08-06 21:06:13 +0300, Arseny Sher wrote:
Hello,
I have looked through the patches. I will first describe relativaly
serious issues I see and then proceed with small nitpicking.- On decoding of aborted xacts. The idea to throw an error once we
detect the abort is appealing, however I think you will have problems
with subxacts in the current implementation. What if subxact issues
DDL and then aborted, but main transaction successfully committed?
I don't see a fundamental issue here. I've not reviewed the current
patchset meaningfully, however. Do you see a fundamental issue here?
- Decoding transactions at PREPARE record changes rules of the "we ship
all commits after lsn 'x'" game. Namely, it will break initial
tablesync: what if consistent snapshot was formed *after* PREPARE, but
before COMMIT PREPARED, and the plugin decides to employ 2pc? Instead
of getting inital contents + continious stream of changes the receiver
will miss the prepared xact contents and raise 'prepared xact doesn't
exist' error. I think the starting point to address this is to forbid
two-phase decoding of xacts with lsn of PREPARE less than
snapbuilder's start_decoding_at.
Yea, that sounds like it need to be addressed.
- Currently we will call abort_prepared cb even if we failed to actually
prepare xact due to concurrent abort. I think it is confusing for
users. We should either handle this by remembering not to invoke
abort_prepared in these cases or at least document this behaviour,
leaving this problem to the receiver side.
What precisely do you mean by "concurrent abort"?
- I find it suspicious that DecodePrepare completely ignores actions of
SnapBuildCommitTxn. For example, to execute invalidations, the latter
sets base snapshot if our xact (or subxacts) did DDL and the snapshot
not set yet. My fantasy doesn't hint me the concrete example
where this would burn at the moment, but it should be considered.
Yea, I think this need to mirror the actions (and thus generalize the
code to avoid duplication)
Now, the bikeshedding.
First patch:
- I am one of those people upthread who don't think that converting
flags to bitmask is beneficial -- especially given that many of them
are mutually exclusive, e.g. xact can't be committed and aborted at
the same time. Apparently you have left this to the committer though.
Similar.
- Andres
Andres Freund <andres@anarazel.de> writes:
- On decoding of aborted xacts. The idea to throw an error once we
detect the abort is appealing, however I think you will have problems
with subxacts in the current implementation. What if subxact issues
DDL and then aborted, but main transaction successfully committed?I don't see a fundamental issue here. I've not reviewed the current
patchset meaningfully, however. Do you see a fundamental issue here?
Hmm, yes, this is not an issue for this patch because after reading
PREPARE record we know all aborted subxacts and won't try to decode
their changes. However, this will be raised once we decide to decode
in-progress transactions. Checking for all subxids is expensive;
moreover, WAL doesn't provide all of them until commit... it might be
easier to prevent vacuuming of aborted stuff while decoding needs it.
Matter for another patch, anyway.
- Currently we will call abort_prepared cb even if we failed to actually
prepare xact due to concurrent abort. I think it is confusing for
users. We should either handle this by remembering not to invoke
abort_prepared in these cases or at least document this behaviour,
leaving this problem to the receiver side.What precisely do you mean by "concurrent abort"?
With current patch, the following is possible:
* We start decoding of some prepared xact;
* Xact aborts (ABORT PREPARED) for any reason;
* Decoding processs notices this on catalog scan and calls abort()
callback;
* Later decoding process reads abort record and calls abort_prepared
callback.
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi Arseny,
- Decoding transactions at PREPARE record changes rules of the "we ship
all commits after lsn 'x'" game. Namely, it will break initial
tablesync: what if consistent snapshot was formed *after* PREPARE, but
before COMMIT PREPARED, and the plugin decides to employ 2pc? Instead
of getting inital contents + continious stream of changes the receiver
will miss the prepared xact contents and raise 'prepared xact doesn't
exist' error. I think the starting point to address this is to forbid
two-phase decoding of xacts with lsn of PREPARE less than
snapbuilder's start_decoding_at.
It will be the job of the plugin to return a consistent answer for
every GID that is encountered. In this case, the plugin will decode
the transaction at COMMIT PREPARED time and not at PREPARE time.
- Currently we will call abort_prepared cb even if we failed to actually
prepare xact due to concurrent abort. I think it is confusing for
users. We should either handle this by remembering not to invoke
abort_prepared in these cases or at least document this behaviour,
leaving this problem to the receiver side.
The point is, when we reach the "ROLLBACK PREPARED", we have no idea
if the "PREPARE" was aborted by this rollback happening concurrently.
So it's possible that the 2PC has been successfully decoded and we
would have to send the rollback to the other side which would need to
check if it needs to rollback locally.
- I find it suspicious that DecodePrepare completely ignores actions of
SnapBuildCommitTxn. For example, to execute invalidations, the latter
sets base snapshot if our xact (or subxacts) did DDL and the snapshot
not set yet. My fantasy doesn't hint me the concrete example
where this would burn at the moment, but it should be considered.
I had discussed this area with Petr and we didn't see any issues as well, then.
Now, the bikeshedding.
First patch:
- I am one of those people upthread who don't think that converting
flags to bitmask is beneficial -- especially given that many of them
are mutually exclusive, e.g. xact can't be committed and aborted at
the same time. Apparently you have left this to the committer though.
Hmm, there seems to be divided opinion on this. I am willing to go
back to using the booleans if there's opposition and if the committer
so wishes. Note that this patch will end up adding 4/5 more booleans
in that case (we add new ones for prepare, commit prepare, abort,
rollback prepare etc).
Second patch:
- Applying gives me
Applying: Support decoding of two-phase transactions at PREPARE
.git/rebase-apply/patch:871: trailing whitespace.+ row. The <function>change_cb</function> callback may access system or + user catalog tables to aid in the process of outputting the row + modification details. In case of decoding a prepared (but yet + uncommitted) transaction or decoding of an uncommitted transaction, this + change callback is ensured sane access to catalog tables regardless of + simultaneous rollback by another backend of this very same transaction.I don't think we should explain this, at least in such words. As
mentioned upthread, we should warn about allowed systable_* accesses
instead. Same for message_cb.
Looks like you are looking at an earlier patchset. The latest patchset
has removed the above.
+ /* + * Tell the reorderbuffer about the surviving subtransactions. We need to + * do this because the main transaction itself has not committed since we + * are in the prepare phase right now. So we need to be sure the snapshot + * is setup correctly for the main transaction in case all changes + * happened in subtransanctions + */While we do certainly need to associate subxacts here, the explanation
looks weird to me. I would leave just the 'Tell the reorderbuffer about
the surviving subtransactions' as in DecodeCommit.}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmationSpurious newline deletion.
- I would rename ReorderBufferCommitInternal to ReorderBufferReplay:
we replay the xact there, not commit.- If xact is empty, we will not prepare it (and call cb),
even if the output plugin asked us. However, we will call
commit_prepared cb.- ReorderBufferTxnIsPrepared and ReorderBufferPrepareNeedSkip do the
same and should be merged with comments explaining that the answer
must be stable.- filter_prepare_cb callback existence is checked in both decode.c and
in filter_prepare_cb_wrapper.+ /* + * The transaction may or may not exist (during restarts for example). + * Anyways, 2PC transactions do not contain any reorderbuffers. So allow + * it to be created below. + */Code around looks sane, but I think that ReorderBufferTXN for our xact
must *not* exist at this moment: if we are going to COMMIT/ABORT
PREPARED it, it must have been replayed and RBTXN purged immediately
after. Also, instead of misty '2PC transactions do not contain any
reorderbuffers' I would say something like 'create dummy
ReorderBufferTXN to pass it to the callback'.- In DecodeAbort: + /* + * If it's ROLLBACK PREPARED then handle it via callbacks. + */ + if (TransactionIdIsValid(xid) && + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && +How xid can be invalid here?
- It might be worthwile to put the check + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && + parsed->dbId == ctx->slot->data.database && + !FilterByOrigin(ctx, origin_id) &&which appears 3 times now into separate function.
+ * two-phase transactions - we either have to have all of them or none. + * The filter_prepare callback is optional, but can only be defined whenKind of controversial (all of them or none, but optional), might be
formulated more accurately.+ /* + * Capabilities of the output plugin. + */ + bool enable_twophase;I would rename this to 'supports_twophase' since this is not an option
but a description of the plugin capabilities.+ /* filter_prepare is optional, but requires two-phase decoding */ + if ((ctx->callbacks.filter_prepare_cb != NULL) && (!ctx->enable_twophase)) + ereport(ERROR, + (errmsg("Output plugin does not support two-phase decoding, but " + "registered filter_prepared callback.")));Don't think we need to check that...
+ * Otherwise call either PREPARE (for twophase transactions) or COMMIT + * (for regular ones). + */ + if (rbtxn_rollback(txn)) + rb->abort(rb, txn, commit_lsn);This is the dead code since we don't have decoding of in-progress xacts
yet.
Yes, the above check can be done away with it.
Third patch: +/* + * An xid value pointing to a possibly ongoing or a prepared transaction. + * Currently used in logical decoding. It's possible that such transactions + * can get aborted while the decoding is ongoing. + */I would explain here that this xid is checked for abort after each
catalog scan, and sent for the details to SetupHistoricSnapshot.+ /* + * If CheckXidAlive is valid, then we check if it aborted. If it did, we + * error out + */ + if (TransactionIdIsValid(CheckXidAlive) && + !TransactionIdIsInProgress(CheckXidAlive) && + !TransactionIdDidCommit(CheckXidAlive)) + ereport(ERROR, + (errcode(ERRCODE_TRANSACTION_ROLLBACK), + errmsg("transaction aborted during system catalog scan")));Probably centralize checks in one function? As well as 'We don't expect
direct calls to heap_fetch...' ones.P.S. Looks like you have torn the thread chain: In-Reply-To header of
mail [1] is missing. Please don't do that.
That wasn't me. I was also annoyed and surprised to see a new email
thread separate from the earlier containing 100 or so messages.
Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Nikhil Sontakke <nikhils@2ndquadrant.com> writes:
- Decoding transactions at PREPARE record changes rules of the "we ship
all commits after lsn 'x'" game. Namely, it will break initial
tablesync: what if consistent snapshot was formed *after* PREPARE, but
before COMMIT PREPARED, and the plugin decides to employ 2pc? Instead
of getting inital contents + continious stream of changes the receiver
will miss the prepared xact contents and raise 'prepared xact doesn't
exist' error. I think the starting point to address this is to forbid
two-phase decoding of xacts with lsn of PREPARE less than
snapbuilder's start_decoding_at.It will be the job of the plugin to return a consistent answer for
every GID that is encountered. In this case, the plugin will decode
the transaction at COMMIT PREPARED time and not at PREPARE time.
I can't imagine a scenario in which plugin would want to send COMMIT
PREPARED instead of replaying xact fully on CP record given it had never
seen PREPARE record. On the other hand, tracking such situations on
plugins side would make plugins life unneccessary complicated: either it
has to dig into snapbuilder/slot internals to learn when the snapshot
became consistent (which currently is impossible as this lsn is not
saved anywhere btw), or it must fsync each its decision to do or not to
do 2PC.
Technically, my concern covers not only tablesync, but just plain
decoding start: we don't want to ship COMMIT PREPARED if the downstream
had never had chance to see PREPARE.
As for tablesync, looking at current implementation I contemplate that
we would need to do something along the following lines:
- Tablesync worker performs COPY.
- It then speaks with main apply worker to arrange (origin)
lsn of sync point, as it does now.
- Tablesync worker applies changes up to arranged lsn; it never uses
two-phase decoding, all xacts are replayed on COMMIT PREPARED.
Moreover, instead of going into SYNCDONE state immediately after
reaching needed lsn, it stops replaying usual commits but continues
to receive changes to finish all transactions which were prepared
before sync point (we would need some additional support from
reorderbuffer to learn when this happens). Only then it goes into
SYNCDONE.
- Behaviour of the main apply worker doesn't change: it
ignores changes of the table in question before sync point and
applies them after sync point. It also can use 2PC decoding of any
transaction or not, as it desires.
I believe this approach would implement tablesync correctly (all changes
are applied, but only once) with minimal fuss.
- Currently we will call abort_prepared cb even if we failed to actually
prepare xact due to concurrent abort. I think it is confusing for
users. We should either handle this by remembering not to invoke
abort_prepared in these cases or at least document this behaviour,
leaving this problem to the receiver side.The point is, when we reach the "ROLLBACK PREPARED", we have no idea
if the "PREPARE" was aborted by this rollback happening concurrently.
So it's possible that the 2PC has been successfully decoded and we
would have to send the rollback to the other side which would need to
check if it needs to rollback locally.
I understand this. But I find this confusing for the users, so I propose
to
- Either document that "you might get abort_prepared cb called even
after abort cb was invoked for the same transaction";
- Or consider adding some infrastructure to reorderbuffer to
remember not to call abort_prepared in these cases. Due to possible
reboots, I think this means that we need not to
ReorderBufferCleanupTXN immediately after failed attempt to replay
xact on PREPARE, but mark it as 'aborted' and keep it until we see
ABORT PREPARED record. If we see that xact is marked as aborted, we
don't call abort_prepared_cb. That way even if the decoder restarts
in between, we will see PREPARE in WAL, inquire xact status (even
if we skip it as already replayed) and mark it as aborted again.
- I find it suspicious that DecodePrepare completely ignores actions of
SnapBuildCommitTxn. For example, to execute invalidations, the latter
sets base snapshot if our xact (or subxacts) did DDL and the snapshot
not set yet. My fantasy doesn't hint me the concrete example
where this would burn at the moment, but it should be considered.I had discussed this area with Petr and we didn't see any issues as well, then.
Ok, simplifying, what SnapBuildCommitTxn practically does is
* Decide whether we are interested in tracking this xact effects, and
if we are, mark it as committed.
* Build and distribute snapshot to all RBTXNs, if it is important.
* Set base snap of our xact if it did DDL, to execute invalidations
during replay.
I see that we don't need to do first two bullets during DecodePrepare:
xact effects are still invisible for everyone but itself after
PREPARE. As for seeing xacts own own changes, it is implemented via
logging cmin/cmax and we don't need to mark xact as committed for that
(c.f ReorderBufferCopySnap).
Regarding the third point... I think in 2PC decoding we might need to
execute invalidations twice:
1) After replaying xact on PREPARE to forget about catalog changes
xact did -- it is not yet committed and must be invisible to
other xacts until CP. In the latest patchset invalidations are
executed only if there is at least one change in xact (it has base
snap). It looks fine: we can't spoil catalogs if there were nothing
to decode. Better to explain that somewhere.
2) After decoding COMMIT PREPARED to make changes visible. In current
patchset it is always done. Actually, *this* is the reason
RBTXN might already exist when we enter ReorderBufferFinishPrepared,
not "(during restarts for example)" as comment says there:
if there were inval messages, RBTXN will be created
in DecodeCommit during their addition.
BTW, "that we might need to execute invalidations, add snapshot" in
SnapBuildCommitTxn looks like a cludge to me: I suppose it is better to
do that at ReorderBufferXidSetCatalogChanges.
Now, another issue is registering xact as committed in
SnapBuildCommitTxn during COMMIT PREPARED processing. Since RBTXN is
always purged after xact replay on PREPARE, the only medium we have for
noticing catalog changes during COMMIT PREPARED is invalidation messages
attached to the CP record. This raises the following question.
* If there is a guarantee that whenever xact makes catalog changes it
generates invalidation messages, then this code is fine. However,
currently ReorderBufferXidSetCatalogChanges is also called on
XLOG_HEAP_INPLACE processing and in SnapBuildProcessNewCid, which
is useless if such guarantee exists.
* If, on the other hand, there is no such guarantee, this code is
broken.
- I am one of those people upthread who don't think that converting
flags to bitmask is beneficial -- especially given that many of them
are mutually exclusive, e.g. xact can't be committed and aborted at
the same time. Apparently you have left this to the committer though.Hmm, there seems to be divided opinion on this. I am willing to go
back to using the booleans if there's opposition and if the committer
so wishes. Note that this patch will end up adding 4/5 more booleans
in that case (we add new ones for prepare, commit prepare, abort,
rollback prepare etc).
Well, you can unite mutually exclusive fields into one enum or char with
macros defining possible values. Transaction can't be committed and
aborted at the same time, etc.
+ row. The <function>change_cb</function> callback may access system or + user catalog tables to aid in the process of outputting the row + modification details. In case of decoding a prepared (but yet + uncommitted) transaction or decoding of an uncommitted transaction, this + change callback is ensured sane access to catalog tables regardless of + simultaneous rollback by another backend of this very same transaction.I don't think we should explain this, at least in such words. As
mentioned upthread, we should warn about allowed systable_* accesses
instead. Same for message_cb.Looks like you are looking at an earlier patchset. The latest patchset
has removed the above.
I see, sorry.
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hello,
Trying to revive this patch which attempts to support logical decoding of
two phase transactions. I've rebased and polished Nikhil's patch on the
current HEAD. Some of the logic in the previous patchset has already been
committed as part of large-in-progress transaction commits, like the
handling of concurrent aborts, so all that logic has been left out. I think
some of the earlier comments have already been addressed or are no longer
relevant. Do have a look at the patch and let me know what you think.I will
try and address any pending issues going forward.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
0001-Support-decoding-of-two-phase-transactions-at-PREPAR.patchapplication/octet-stream; name=0001-Support-decoding-of-two-phase-transactions-at-PREPAR.patchDownload
From 07dafb931715df67914cd1b2e50e7d71d873fa9b Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Mon, 7 Sep 2020 00:41:31 -0400
Subject: [PATCH] Support decoding of two-phase transactions at PREPARE
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
On the subscriber, the transactions will be executed as two-phase
transactions, with the same GID. This is important for various
external transaction managers, that often encode information into
the GID itself.
Includes documentation changes.
---
contrib/test_decoding/expected/prepared.out | 185 ++++++++++++++++++---
contrib/test_decoding/sql/prepared.sql | 77 ++++++++-
contrib/test_decoding/test_decoding.c | 181 +++++++++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 127 ++++++++++++++-
src/backend/replication/logical/decode.c | 141 ++++++++++++++--
src/backend/replication/logical/logical.c | 203 ++++++++++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 188 ++++++++++++++++++++--
src/include/replication/output_plugin.h | 46 ++++++
src/include/replication/reorderbuffer.h | 78 ++++++++-
9 files changed, 1159 insertions(+), 67 deletions(-)
diff --git a/contrib/test_decoding/expected/prepared.out b/contrib/test_decoding/expected/prepared.out
index 46e915d..94fb0c9 100644
--- a/contrib/test_decoding/expected/prepared.out
+++ b/contrib/test_decoding/expected/prepared.out
@@ -6,19 +6,50 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
init
(1 row)
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ PREPARE TRANSACTION 'test_prepared#1'
+(3 rows)
+
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:2
+ COMMIT
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(6 rows)
+
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (4);
-- test prepared xact containing ddl
BEGIN;
@@ -26,45 +57,149 @@ INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
-INSERT INTO test_prepared2 VALUES (7);
-COMMIT PREPARED 'test_prepared#3';
--- make sure stuff still works
-INSERT INTO test_prepared1 VALUES (8);
-INSERT INTO test_prepared2 VALUES (9);
--- cleanup
-DROP TABLE test_prepared1;
-DROP TABLE test_prepared2;
--- show results
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
data
-------------------------------------------------------------------------
BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- COMMIT
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:2
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:4
COMMIT
BEGIN
- table public.test_prepared2: INSERT: id[integer]:7
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:5
table public.test_prepared1: INSERT: id[integer]:6 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(7 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (8);
+INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:8 data[text]:null
COMMIT
BEGIN
table public.test_prepared2: INSERT: id[integer]:9
COMMIT
-(22 rows)
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+ relation | locktype | mode
+----------+----------+------
+(0 rows)
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:10 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:11 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:12
+ PREPARE TRANSACTION 'test_prepared_lock2'
+ COMMIT PREPARED 'test_prepared_lock2'
+(8 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
SELECT pg_drop_replication_slot('regression_slot');
pg_drop_replication_slot
diff --git a/contrib/test_decoding/sql/prepared.sql b/contrib/test_decoding/sql/prepared.sql
index e726397..ca801e4 100644
--- a/contrib/test_decoding/sql/prepared.sql
+++ b/contrib/test_decoding/sql/prepared.sql
@@ -2,21 +2,25 @@
SET synchronous_commit = on;
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (4);
@@ -27,24 +31,83 @@ ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- cleanup
DROP TABLE test_prepared1;
DROP TABLE test_prepared2;
--- show results
+-- show results. There should be nothing to show
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 3474515..d5438c5 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -49,6 +54,8 @@ static void pg_output_begin(LogicalDecodingContext *ctx,
bool last_write);
static void pg_decode_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void pg_decode_abort_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr abort_lsn);
static void pg_decode_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, Relation rel,
ReorderBufferChange *change);
@@ -84,6 +91,19 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
void
_PG_init(void)
@@ -102,6 +122,7 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pg_decode_change;
cb->truncate_cb = pg_decode_truncate;
cb->commit_cb = pg_decode_commit_txn;
+ cb->abort_cb = pg_decode_abort_txn;
cb->filter_by_origin_cb = pg_decode_filter;
cb->shutdown_cb = pg_decode_shutdown;
cb->message_cb = pg_decode_message;
@@ -112,6 +133,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
+
}
@@ -132,11 +158,14 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid = InvalidTransactionId;
ctx->output_plugin_private = data;
opt->output_type = OUTPUT_PLUGIN_TEXTUAL_OUTPUT;
opt->receive_rewrites = false;
+ /* this plugin supports decoding of 2pc */
+ opt->enable_twophase = true;
foreach(option, ctx->output_plugin_options)
{
@@ -223,6 +252,32 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "check-xid") == 0)
+ {
+ if (elem->arg)
+ {
+ errno = 0;
+ data->check_xid = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (errno == EINVAL || errno == ERANGE)
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid is not a valid number: \"%s\"",
+ strVal(elem->arg))));
+ }
+ else
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid needs an input value")));
+
+ if (data->check_xid <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("Specify positive value for parameter \"%s\","
+ " you specified \"%s\"",
+ elem->defname, strVal(elem->arg))));
+ }
else
{
ereport(ERROR,
@@ -293,6 +348,116 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/* ABORT callback */
+static void
+pg_decode_abort_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "ABORT %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "ABORT");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -451,6 +616,22 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid is specified */
+ if (TransactionIdIsValid(data->check_xid))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid);
+ while (TransactionIdIsInProgress(data->check_xid))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid) &&
+ !TransactionIdDidCommit(data->check_xid))
+ elog(LOG, "%u aborted", data->check_xid);
+
+ Assert(TransactionIdDidAbort(data->check_xid));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 8d4fdf6..22ee70f 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -386,7 +386,12 @@ typedef struct OutputPluginCallbacks
LogicalDecodeChangeCB change_cb;
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeAbortCB abort_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
@@ -477,7 +482,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +589,71 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Commit Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+ <title>Rollback Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>abort_prepared_cb</function> callback is called whenever
+ a rollback prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort">
+ <title>Transaction Abort Callback</title>
+
+ <para>
+ The required <function>abort_cb</function> callback is called whenever
+ a transaction abort has to be initiated. This can happen if we are
+ decoding a transaction that has been prepared for two-phase commit and
+ a concurrent rollback happens while we are decoding it.
+<programlisting>
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +663,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +746,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +800,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d..63d5acf 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -70,6 +70,9 @@ static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -312,17 +315,34 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->options.enable_twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -647,9 +667,82 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
buf->origptr, buf->endptr);
}
+ /*
+ * Decide if we're processing COMMIT PREPARED, or a regular COMMIT.
+ * Regular commit simply triggers a replay of transaction changes from the
+ * reorder buffer. For COMMIT PREPARED that however already happened at
+ * PREPARE time, and so we only need to notify the subscriber that the GID
+ * finally committed.
+ *
+ * For output plugins that do not support PREPARE-time decoding of
+ * two-phase transactions, we never even see the PREPARE and all two-phase
+ * transactions simply fall through to the second branch.
+ */
+ if (TransactionIdIsValid(parsed->twophase_xid) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder,
+ parsed->twophase_xid, parsed->twophase_gid))
+ {
+ Assert(xid == parsed->twophase_xid);
+ /* we are processing COMMIT PREPARED */
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+ else
+ {
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ /*
+ * Process invalidation messages, even if we're not interested in the
+ * transaction's contents, since the various caches need to always be
+ * consistent.
+ */
+ if (parsed->nmsgs > 0)
+ {
+ if (!ctx->fast_forward)
+ ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
+ parsed->nmsgs, parsed->msgs);
+ ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+ }
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
/* replay actions of all transaction + subtransactions in order */
- ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
- commit_time, origin_id, origin_lsn);
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
}
/*
@@ -661,6 +754,30 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid)
{
int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
for (i = 0; i < parsed->nsubxacts; i++)
{
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 0f6af95..02b726c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -58,6 +58,16 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static void abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -206,6 +216,11 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->abort = abort_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->abort_prepared = abort_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -684,6 +699,33 @@ startup_cb_wrapper(LogicalDecodingContext *ctx, OutputPluginOptions *opt, bool i
/* do the actual work: call callback */
ctx->callbacks.startup_cb(ctx, opt, is_init);
+ /*
+ * If the plugin claims to support two-phase transactions, then
+ * check that the plugin implements all callbacks necessary to decode
+ * two-phase transactions - we either have to have all of them or none.
+ * The filter_prepare callback is optional, but can only be defined when
+ * two-phase decoding is enabled (i.e. the three other callbacks are
+ * defined).
+ */
+ if (opt->enable_twophase)
+ {
+ int twophase_callbacks = (ctx->callbacks.prepare_cb != NULL) +
+ (ctx->callbacks.commit_prepared_cb != NULL) +
+ (ctx->callbacks.abort_prepared_cb != NULL);
+
+ /* Plugins with incorrect number of two-phase callbacks are broken. */
+ if ((twophase_callbacks != 3) && (twophase_callbacks != 0))
+ ereport(ERROR,
+ (errmsg("Output plugin registered only %d twophase callbacks. ",
+ twophase_callbacks)));
+ }
+
+ /* filter_prepare is optional, but requires two-phase decoding */
+ if ((ctx->callbacks.filter_prepare_cb != NULL) && (!opt->enable_twophase))
+ ereport(ERROR,
+ (errmsg("Output plugin does not support two-phase decoding, but "
+ "registered filter_prepared callback.")));
+
/* Pop the error context stack */
error_context_stack = errcallback.previous;
}
@@ -782,6 +824,122 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort";
+ state.report_location = txn->final_lsn; /* beginning of abort record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -858,6 +1016,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of twophase at PREPARE time is not enabled. In that
+ * case all twophase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->options.enable_twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 1975d62..6d9cfbd 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -413,6 +413,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
if (txn->tuplecid_hash != NULL)
{
@@ -1987,7 +1992,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2249,7 +2254,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
break;
}
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
@@ -2278,7 +2282,24 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call abort/commit/prepare callback, depending on the transaction
+ * state.
+ *
+ * If the transaction aborted during apply (which currently can happen
+ * only for prepared transactions), simply call the abort callback.
+ *
+ * Otherwise call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
+ else if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2395,23 +2416,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2453,6 +2467,140 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a twophase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ABORT PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->abort_prepared(rb, txn, commit_lsn);
+ }
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2495,7 +2643,13 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
+
ReorderBufferCleanupTXN(rb, txn);
}
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..f6ca87f 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -27,6 +27,7 @@ typedef struct OutputPluginOptions
{
OutputPluginOutputType output_type;
bool receive_rewrites;
+ bool enable_twophase;
} OutputPluginOptions;
/*
@@ -78,6 +79,46 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr commit_lsn);
/*
+ * Called for an implicit ABORT of a transaction.
+ */
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/abort_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+/*
* Called for the generic logical decoding messages.
*/
typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
@@ -170,7 +211,12 @@ typedef struct OutputPluginCallbacks
LogicalDecodeChangeCB change_cb;
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeAbortCB abort_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1ae17d5..820840a 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -162,9 +163,14 @@ typedef struct ReorderBufferChange
#define RBTXN_HAS_CATALOG_CHANGES 0x0001
#define RBTXN_IS_SUBXACT 0x0002
#define RBTXN_IS_SERIALIZED 0x0004
-#define RBTXN_IS_STREAMED 0x0008
-#define RBTXN_HAS_TOAST_INSERT 0x0010
-#define RBTXN_HAS_SPEC_INSERT 0x0020
+#define RBTXN_PREPARE 0x0008
+#define RBTXN_COMMIT_PREPARED 0x0010
+#define RBTXN_ROLLBACK_PREPARED 0x0020
+#define RBTXN_COMMIT 0x0040
+#define RBTXN_ROLLBACK 0x0080
+#define RBTXN_IS_STREAMED 0x0100
+#define RBTXN_HAS_TOAST_INSERT 0x0200
+#define RBTXN_HAS_SPEC_INSERT 0x0400
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -218,6 +224,17 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+/* was this txn committed in the meanwhile? */
+#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback(txn) (txn->txn_flags & RBTXN_ROLLBACK)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -229,6 +246,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -390,6 +410,39 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+
+
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -482,6 +535,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferAbortPreparedCB abort_prepared;
ReorderBufferMessageCB message;
/*
@@ -548,6 +606,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -571,6 +634,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
On Mon, Sep 7, 2020 at 10:54 AM Ajin Cherian <itsajin@gmail.com> wrote:
Hello,
Trying to revive this patch which attempts to support logical decoding of two phase transactions. I've rebased and polished Nikhil's patch on the current HEAD. Some of the logic in the previous patchset has already been committed as part of large-in-progress transaction commits, like the handling of concurrent aborts, so all that logic has been left out.
I am not sure about your point related to concurrent aborts. I think
we need some changes related to this patch. Have you tried to test
this behavior? Basically, we have the below code in
ReorderBufferProcessTXN() which will be hit for concurrent aborts, and
currently, the Asserts shown below will fail.
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
* This error can only occur when we are sending the data in
* streaming mode and the streaming is not finished yet.
*/
Assert(streaming);
Assert(stream_started);
Nikhil has a test for the same
(0004-Teach-test_decoding-plugin-to-work-with-2PC.Jan4) in his last
email [1]. You might want to use it to test this behavior. I think you
can also keep the tests as a separate patch as Nikhil had.
One other comment:
===================
@@ -27,6 +27,7 @@ typedef struct OutputPluginOptions
{
OutputPluginOutputType output_type;
bool receive_rewrites;
+ bool enable_twophase;
} OutputPluginOptions;
..
..
@@ -684,6 +699,33 @@ startup_cb_wrapper(LogicalDecodingContext *ctx,
OutputPluginOptions *opt, bool i
/* do the actual work: call callback */
ctx->callbacks.startup_cb(ctx, opt, is_init);
+ /*
+ * If the plugin claims to support two-phase transactions, then
+ * check that the plugin implements all callbacks necessary to decode
+ * two-phase transactions - we either have to have all of them or none.
+ * The filter_prepare callback is optional, but can only be defined when
+ * two-phase decoding is enabled (i.e. the three other callbacks are
+ * defined).
+ */
+ if (opt->enable_twophase)
+ {
+ int twophase_callbacks = (ctx->callbacks.prepare_cb != NULL) +
+ (ctx->callbacks.commit_prepared_cb != NULL) +
+ (ctx->callbacks.abort_prepared_cb != NULL);
+
+ /* Plugins with incorrect number of two-phase callbacks are broken. */
+ if ((twophase_callbacks != 3) && (twophase_callbacks != 0))
+ ereport(ERROR,
+ (errmsg("Output plugin registered only %d twophase callbacks. ",
+ twophase_callbacks)));
+ }
I don't know why the patch has used this way to implement an option to
enable two-phase. Can't we use how we implement 'stream-changes'
option in commit 7259736a6e? Just refer how we set ctx->streaming and
you can use a similar way to set this parameter.
--
With Regards,
Amit Kapila.
On Mon, Sep 7, 2020 at 11:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Nikhil has a test for the same
(0004-Teach-test_decoding-plugin-to-work-with-2PC.Jan4) in his last
email [1]. You might want to use it to test this behavior. I think you
can also keep the tests as a separate patch as Nikhil had.Done. I've added the tests and also tweaked code to make sure that the
aborts during 2 phase commits are also handled.
I don't know why the patch has used this way to implement an option to
enable two-phase. Can't we use how we implement 'stream-changes'
option in commit 7259736a6e? Just refer how we set ctx->streaming and
you can use a similar way to set this parameter.
Done, I've moved the checks for callbacks to inside the corresponding
wrappers.
Regards,
Ajin Cherian
Fujitsu Australia
Attachments:
0001-Support-decoding-of-two-phase-transactions-at-PREPAR.patchapplication/octet-stream; name=0001-Support-decoding-of-two-phase-transactions-at-PREPAR.patchDownload
From 3716f5eee7130f9d40b42a59440fefe13849bb8a Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 9 Sep 2020 05:33:35 -0400
Subject: [PATCH 1/2] Support decoding of two-phase transactions at PREPARE
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
On the subscriber, the transactions will be executed as two-phase
transactions, with the same GID. This is important for various
external transaction managers, that often encode information into
the GID itself.
Includes documentation changes.
---
contrib/test_decoding/expected/prepared.out | 185 ++++++++++++++++++---
contrib/test_decoding/sql/prepared.sql | 77 ++++++++-
contrib/test_decoding/test_decoding.c | 181 ++++++++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 127 +++++++++++++-
src/backend/replication/logical/decode.c | 141 ++++++++++++++--
src/backend/replication/logical/logical.c | 194 ++++++++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 209 +++++++++++++++++++++---
src/include/replication/output_plugin.h | 46 ++++++
src/include/replication/reorderbuffer.h | 78 ++++++++-
9 files changed, 1165 insertions(+), 73 deletions(-)
diff --git a/contrib/test_decoding/expected/prepared.out b/contrib/test_decoding/expected/prepared.out
index 46e915d..94fb0c9 100644
--- a/contrib/test_decoding/expected/prepared.out
+++ b/contrib/test_decoding/expected/prepared.out
@@ -6,19 +6,50 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
init
(1 row)
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ PREPARE TRANSACTION 'test_prepared#1'
+(3 rows)
+
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:2
+ COMMIT
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(6 rows)
+
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (4);
-- test prepared xact containing ddl
BEGIN;
@@ -26,45 +57,149 @@ INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
-INSERT INTO test_prepared2 VALUES (7);
-COMMIT PREPARED 'test_prepared#3';
--- make sure stuff still works
-INSERT INTO test_prepared1 VALUES (8);
-INSERT INTO test_prepared2 VALUES (9);
--- cleanup
-DROP TABLE test_prepared1;
-DROP TABLE test_prepared2;
--- show results
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
data
-------------------------------------------------------------------------
BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- COMMIT
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:2
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:4
COMMIT
BEGIN
- table public.test_prepared2: INSERT: id[integer]:7
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:5
table public.test_prepared1: INSERT: id[integer]:6 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(7 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (8);
+INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:8 data[text]:null
COMMIT
BEGIN
table public.test_prepared2: INSERT: id[integer]:9
COMMIT
-(22 rows)
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+ relation | locktype | mode
+----------+----------+------
+(0 rows)
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:10 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:11 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:12
+ PREPARE TRANSACTION 'test_prepared_lock2'
+ COMMIT PREPARED 'test_prepared_lock2'
+(8 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
SELECT pg_drop_replication_slot('regression_slot');
pg_drop_replication_slot
diff --git a/contrib/test_decoding/sql/prepared.sql b/contrib/test_decoding/sql/prepared.sql
index e726397..ca801e4 100644
--- a/contrib/test_decoding/sql/prepared.sql
+++ b/contrib/test_decoding/sql/prepared.sql
@@ -2,21 +2,25 @@
SET synchronous_commit = on;
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (4);
@@ -27,24 +31,83 @@ ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- cleanup
DROP TABLE test_prepared1;
DROP TABLE test_prepared2;
--- show results
+-- show results. There should be nothing to show
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 3474515..d5438c5 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -49,6 +54,8 @@ static void pg_output_begin(LogicalDecodingContext *ctx,
bool last_write);
static void pg_decode_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void pg_decode_abort_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr abort_lsn);
static void pg_decode_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, Relation rel,
ReorderBufferChange *change);
@@ -84,6 +91,19 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
void
_PG_init(void)
@@ -102,6 +122,7 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pg_decode_change;
cb->truncate_cb = pg_decode_truncate;
cb->commit_cb = pg_decode_commit_txn;
+ cb->abort_cb = pg_decode_abort_txn;
cb->filter_by_origin_cb = pg_decode_filter;
cb->shutdown_cb = pg_decode_shutdown;
cb->message_cb = pg_decode_message;
@@ -112,6 +133,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
+
}
@@ -132,11 +158,14 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid = InvalidTransactionId;
ctx->output_plugin_private = data;
opt->output_type = OUTPUT_PLUGIN_TEXTUAL_OUTPUT;
opt->receive_rewrites = false;
+ /* this plugin supports decoding of 2pc */
+ opt->enable_twophase = true;
foreach(option, ctx->output_plugin_options)
{
@@ -223,6 +252,32 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "check-xid") == 0)
+ {
+ if (elem->arg)
+ {
+ errno = 0;
+ data->check_xid = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (errno == EINVAL || errno == ERANGE)
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid is not a valid number: \"%s\"",
+ strVal(elem->arg))));
+ }
+ else
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid needs an input value")));
+
+ if (data->check_xid <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("Specify positive value for parameter \"%s\","
+ " you specified \"%s\"",
+ elem->defname, strVal(elem->arg))));
+ }
else
{
ereport(ERROR,
@@ -293,6 +348,116 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/* ABORT callback */
+static void
+pg_decode_abort_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "ABORT %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "ABORT");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -451,6 +616,22 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid is specified */
+ if (TransactionIdIsValid(data->check_xid))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid);
+ while (TransactionIdIsInProgress(data->check_xid))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid) &&
+ !TransactionIdDidCommit(data->check_xid))
+ elog(LOG, "%u aborted", data->check_xid);
+
+ Assert(TransactionIdDidAbort(data->check_xid));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 8d4fdf6..22ee70f 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -386,7 +386,12 @@ typedef struct OutputPluginCallbacks
LogicalDecodeChangeCB change_cb;
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeAbortCB abort_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
@@ -477,7 +482,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +589,71 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Commit Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+ <title>Rollback Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>abort_prepared_cb</function> callback is called whenever
+ a rollback prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort">
+ <title>Transaction Abort Callback</title>
+
+ <para>
+ The required <function>abort_cb</function> callback is called whenever
+ a transaction abort has to be initiated. This can happen if we are
+ decoding a transaction that has been prepared for two-phase commit and
+ a concurrent rollback happens while we are decoding it.
+<programlisting>
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +663,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +746,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +800,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d..63d5acf 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -70,6 +70,9 @@ static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -312,17 +315,34 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->options.enable_twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -647,9 +667,82 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
buf->origptr, buf->endptr);
}
+ /*
+ * Decide if we're processing COMMIT PREPARED, or a regular COMMIT.
+ * Regular commit simply triggers a replay of transaction changes from the
+ * reorder buffer. For COMMIT PREPARED that however already happened at
+ * PREPARE time, and so we only need to notify the subscriber that the GID
+ * finally committed.
+ *
+ * For output plugins that do not support PREPARE-time decoding of
+ * two-phase transactions, we never even see the PREPARE and all two-phase
+ * transactions simply fall through to the second branch.
+ */
+ if (TransactionIdIsValid(parsed->twophase_xid) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder,
+ parsed->twophase_xid, parsed->twophase_gid))
+ {
+ Assert(xid == parsed->twophase_xid);
+ /* we are processing COMMIT PREPARED */
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+ else
+ {
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ /*
+ * Process invalidation messages, even if we're not interested in the
+ * transaction's contents, since the various caches need to always be
+ * consistent.
+ */
+ if (parsed->nmsgs > 0)
+ {
+ if (!ctx->fast_forward)
+ ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
+ parsed->nmsgs, parsed->msgs);
+ ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+ }
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
/* replay actions of all transaction + subtransactions in order */
- ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
- commit_time, origin_id, origin_lsn);
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
}
/*
@@ -661,6 +754,30 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid)
{
int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
for (i = 0; i < parsed->nsubxacts; i++)
{
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 0f6af95..a569ab8 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -58,6 +58,16 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static void abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -206,6 +216,11 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->abort = abort_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->abort_prepared = abort_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -782,6 +797,140 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort";
+ state.report_location = txn->final_lsn; /* beginning of abort record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then prepare callback is mandatory */
+ if (ctx->options.enable_twophase && ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then commit prepared callback is mandatory */
+ if (ctx->options.enable_twophase && ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then abort prepared callback is mandatory */
+ if (ctx->options.enable_twophase && ctx->callbacks.abort_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register abort_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -858,6 +1007,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of twophase at PREPARE time is not enabled. In that
+ * case all twophase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->options.enable_twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 1975d62..d6556be 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -413,6 +413,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
if (txn->tuplecid_hash != NULL)
{
@@ -1987,7 +1992,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2249,7 +2254,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
break;
}
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
@@ -2278,7 +2282,24 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call abort/commit/prepare callback, depending on the transaction
+ * state.
+ *
+ * If the transaction aborted during apply (which currently can happen
+ * only for prepared transactions), simply call the abort callback.
+ *
+ * Otherwise call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
+ else if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2361,8 +2382,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This error can only occur when we are sending the data in
* streaming mode and the streaming is not finished yet.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2370,10 +2391,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream remaining data. */
+ if (streaming && stream_started)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ rb->abort(rb, txn, commit_lsn);
+ }
}
else
{
@@ -2395,23 +2425,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2453,6 +2476,140 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a twophase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ABORT PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->abort_prepared(rb, txn, commit_lsn);
+ }
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2495,7 +2652,13 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
+
ReorderBufferCleanupTXN(rb, txn);
}
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..f6ca87f 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -27,6 +27,7 @@ typedef struct OutputPluginOptions
{
OutputPluginOutputType output_type;
bool receive_rewrites;
+ bool enable_twophase;
} OutputPluginOptions;
/*
@@ -78,6 +79,46 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr commit_lsn);
/*
+ * Called for an implicit ABORT of a transaction.
+ */
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/abort_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+/*
* Called for the generic logical decoding messages.
*/
typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
@@ -170,7 +211,12 @@ typedef struct OutputPluginCallbacks
LogicalDecodeChangeCB change_cb;
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeAbortCB abort_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1ae17d5..820840a 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -162,9 +163,14 @@ typedef struct ReorderBufferChange
#define RBTXN_HAS_CATALOG_CHANGES 0x0001
#define RBTXN_IS_SUBXACT 0x0002
#define RBTXN_IS_SERIALIZED 0x0004
-#define RBTXN_IS_STREAMED 0x0008
-#define RBTXN_HAS_TOAST_INSERT 0x0010
-#define RBTXN_HAS_SPEC_INSERT 0x0020
+#define RBTXN_PREPARE 0x0008
+#define RBTXN_COMMIT_PREPARED 0x0010
+#define RBTXN_ROLLBACK_PREPARED 0x0020
+#define RBTXN_COMMIT 0x0040
+#define RBTXN_ROLLBACK 0x0080
+#define RBTXN_IS_STREAMED 0x0100
+#define RBTXN_HAS_TOAST_INSERT 0x0200
+#define RBTXN_HAS_SPEC_INSERT 0x0400
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -218,6 +224,17 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+/* was this txn committed in the meanwhile? */
+#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback(txn) (txn->txn_flags & RBTXN_ROLLBACK)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -229,6 +246,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -390,6 +410,39 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+
+
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -482,6 +535,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferAbortPreparedCB abort_prepared;
ReorderBufferMessageCB message;
/*
@@ -548,6 +606,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -571,6 +634,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
0002-Tap-test-to-test-concurrent-aborts-during-2-phase-co.patchapplication/octet-stream; name=0002-Tap-test-to-test-concurrent-aborts-during-2-phase-co.patchDownload
From 5ded273d7e9005f102474bc773b249dfb544a75b Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 9 Sep 2020 05:50:11 -0400
Subject: [PATCH 2/2] Tap test to test concurrent aborts during 2 phase commits
This test is specifically for testing concurrent abort while logical decode
is ongoing. Pass in the xid of the 2PC to the plugin as an option.
On the receipt of a valid "check-xid", the change API in the test decoding
plugin will wait for it to be aborted.
---
contrib/test_decoding/Makefile | 1 +
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++++++++++++++++++++++++
2 files changed, 120 insertions(+)
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index ed9a3d6..175b971 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -8,6 +8,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
spill slot truncate stream
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top
+TAP_TESTS = 1
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..99a9249
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
--
1.8.3.1
On Wed, Sep 9, 2020 at 3:33 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Mon, Sep 7, 2020 at 11:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Nikhil has a test for the same
(0004-Teach-test_decoding-plugin-to-work-with-2PC.Jan4) in his last
email [1]. You might want to use it to test this behavior. I think you
can also keep the tests as a separate patch as Nikhil had.Done. I've added the tests and also tweaked code to make sure that the aborts during 2 phase commits are also handled.
Okay, I'll look into your changes but before that today, I have gone
through this entire thread to check if there are any design problems
and found that there were two major issues in the original proposal,
(a) one was to handle concurrent aborts which I think we should be
able to deal in a way similar to what we have done for decoding of
in-progress transactions and (b) what if someone specifically locks
pg_class or pg_attribute in exclusive mode (say be Lock pg_attribute
...), it seems the deadlock can happen in that case [0]/messages/by-id/20170328012546.473psm6546bgsi2c@alap3.anarazel.de. AFAIU, people
seem to think if there is no realistic scenario where deadlock can
happen apart from user explicitly locking the system catalog then we
might be able to get away by just ignoring such xacts to be decoded at
prepare time or would block it in some other way as any way that will
block the entire system. I am not sure what is the right thing but
something has to be done to avoid any sort of deadlock for this.
Another thing, I noticed is that originally we have subscriber-side
support as well, see [1]/messages/by-id/CAMGcDxchx=0PeQBVLzrgYG2AQ49QSRxHj5DCp7yy0QrJR0S0nA@mail.gmail.com (see *pgoutput* patch) but later dropped it
due to some reasons [2]/messages/by-id/CAMGcDxc-kuO9uq0zRCRwbHWBj_rePY9=raR7M9pZGWoj9EOGdg@mail.gmail.com. I think we should have pgoutput support as
well, so see what is required to get that incorporated.
I would also like to summarize my thinking on the usefulness of this
feature. One of the authors of this patch Stats wants this for a
conflict-free logical replication, see more details [3]/messages/by-id/CAMsr+YHQzGxnR-peT4SbX2-xiG2uApJMTgZ4a3TiRBM6COyfqg@mail.gmail.com. Craig seems
to suggest [3]/messages/by-id/CAMsr+YHQzGxnR-peT4SbX2-xiG2uApJMTgZ4a3TiRBM6COyfqg@mail.gmail.com that this will allow us to avoid conflicting schema
changes at different nodes though it is not clear to me if that is
possible without some external code support because we don't send
schema changes in logical replication, maybe Craig can shed some light
on this. Another use-case, I am thinking is if this can be used for
scaling-out reads as well. Because of 2PC, we can ensure that on
subscribers we have all the data committed on the master. Now, we can
design a system where different nodes are owners of some set of tables
and we can always get the data of those tables reliably from those
nodes, and then one can have some external process that will route the
reads accordingly. I know that the last idea is a bit of a hand-waving
but it seems to be possible after this feature.
[0]: /messages/by-id/20170328012546.473psm6546bgsi2c@alap3.anarazel.de
[1]: /messages/by-id/CAMGcDxchx=0PeQBVLzrgYG2AQ49QSRxHj5DCp7yy0QrJR0S0nA@mail.gmail.com
[2]: /messages/by-id/CAMGcDxc-kuO9uq0zRCRwbHWBj_rePY9=raR7M9pZGWoj9EOGdg@mail.gmail.com
[3]: /messages/by-id/CAMsr+YHQzGxnR-peT4SbX2-xiG2uApJMTgZ4a3TiRBM6COyfqg@mail.gmail.com
--
With Regards,
Amit Kapila.
On Sat, Sep 12, 2020 at 9:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Another thing, I noticed is that originally we have subscriber-side
support as well, see [1] (see *pgoutput* patch) but later dropped it
due to some reasons [2]. I think we should have pgoutput support as
well, so see what is required to get that incorporated.I have added the rebased patch-set for pgoutput and subscriber side
changes as well. This also includes a test case in subscriber.
regards,
Ajin Cherian
Attachments:
0001-Support-decoding-of-two-phase-transactions.patchapplication/octet-stream; name=0001-Support-decoding-of-two-phase-transactions.patchDownload
From 68d5222520c245427d6e3a9ecc95657a36212f09 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 15 Sep 2020 07:17:26 -0400
Subject: [PATCH 1/3] Support decoding of two-phase transactions
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes documentation changes.
---
contrib/test_decoding/expected/prepared.out | 185 ++++++++++++++++++---
contrib/test_decoding/sql/prepared.sql | 77 ++++++++-
contrib/test_decoding/test_decoding.c | 181 ++++++++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 127 +++++++++++++-
src/backend/replication/logical/decode.c | 141 ++++++++++++++--
src/backend/replication/logical/logical.c | 194 ++++++++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 209 +++++++++++++++++++++---
src/include/replication/output_plugin.h | 46 ++++++
src/include/replication/reorderbuffer.h | 78 ++++++++-
9 files changed, 1165 insertions(+), 73 deletions(-)
diff --git a/contrib/test_decoding/expected/prepared.out b/contrib/test_decoding/expected/prepared.out
index 46e915d..94fb0c9 100644
--- a/contrib/test_decoding/expected/prepared.out
+++ b/contrib/test_decoding/expected/prepared.out
@@ -6,19 +6,50 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
init
(1 row)
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ PREPARE TRANSACTION 'test_prepared#1'
+(3 rows)
+
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:2
+ COMMIT
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(6 rows)
+
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (4);
-- test prepared xact containing ddl
BEGIN;
@@ -26,45 +57,149 @@ INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
-INSERT INTO test_prepared2 VALUES (7);
-COMMIT PREPARED 'test_prepared#3';
--- make sure stuff still works
-INSERT INTO test_prepared1 VALUES (8);
-INSERT INTO test_prepared2 VALUES (9);
--- cleanup
-DROP TABLE test_prepared1;
-DROP TABLE test_prepared2;
--- show results
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
data
-------------------------------------------------------------------------
BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- COMMIT
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:2
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:4
COMMIT
BEGIN
- table public.test_prepared2: INSERT: id[integer]:7
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:5
table public.test_prepared1: INSERT: id[integer]:6 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(7 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (8);
+INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:8 data[text]:null
COMMIT
BEGIN
table public.test_prepared2: INSERT: id[integer]:9
COMMIT
-(22 rows)
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+ relation | locktype | mode
+----------+----------+------
+(0 rows)
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:10 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:11 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:12
+ PREPARE TRANSACTION 'test_prepared_lock2'
+ COMMIT PREPARED 'test_prepared_lock2'
+(8 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
SELECT pg_drop_replication_slot('regression_slot');
pg_drop_replication_slot
diff --git a/contrib/test_decoding/sql/prepared.sql b/contrib/test_decoding/sql/prepared.sql
index e726397..ca801e4 100644
--- a/contrib/test_decoding/sql/prepared.sql
+++ b/contrib/test_decoding/sql/prepared.sql
@@ -2,21 +2,25 @@
SET synchronous_commit = on;
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (4);
@@ -27,24 +31,83 @@ ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- cleanup
DROP TABLE test_prepared1;
DROP TABLE test_prepared2;
--- show results
+-- show results. There should be nothing to show
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e60ab34..149e7f6 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -49,6 +54,8 @@ static void pg_output_begin(LogicalDecodingContext *ctx,
bool last_write);
static void pg_decode_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void pg_decode_abort_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr abort_lsn);
static void pg_decode_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, Relation rel,
ReorderBufferChange *change);
@@ -88,6 +95,19 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
void
_PG_init(void)
@@ -106,6 +126,7 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pg_decode_change;
cb->truncate_cb = pg_decode_truncate;
cb->commit_cb = pg_decode_commit_txn;
+ cb->abort_cb = pg_decode_abort_txn;
cb->filter_by_origin_cb = pg_decode_filter;
cb->shutdown_cb = pg_decode_shutdown;
cb->message_cb = pg_decode_message;
@@ -116,6 +137,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
+
}
@@ -136,11 +162,14 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid = InvalidTransactionId;
ctx->output_plugin_private = data;
opt->output_type = OUTPUT_PLUGIN_TEXTUAL_OUTPUT;
opt->receive_rewrites = false;
+ /* this plugin supports decoding of 2pc */
+ opt->enable_twophase = true;
foreach(option, ctx->output_plugin_options)
{
@@ -227,6 +256,32 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "check-xid") == 0)
+ {
+ if (elem->arg)
+ {
+ errno = 0;
+ data->check_xid = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (errno == EINVAL || errno == ERANGE)
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid is not a valid number: \"%s\"",
+ strVal(elem->arg))));
+ }
+ else
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid needs an input value")));
+
+ if (data->check_xid <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("Specify positive value for parameter \"%s\","
+ " you specified \"%s\"",
+ elem->defname, strVal(elem->arg))));
+ }
else
{
ereport(ERROR,
@@ -297,6 +352,116 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/* ABORT callback */
+static void
+pg_decode_abort_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "ABORT %u", txn->xid);
+ else
+ appendStringInfoString(ctx->out, "ABORT");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +620,22 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid is specified */
+ if (TransactionIdIsValid(data->check_xid))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid);
+ while (TransactionIdIsInProgress(data->check_xid))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid) &&
+ !TransactionIdDidCommit(data->check_xid))
+ elog(LOG, "%u aborted", data->check_xid);
+
+ Assert(TransactionIdDidAbort(data->check_xid));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..1cddfeb 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -386,7 +386,12 @@ typedef struct OutputPluginCallbacks
LogicalDecodeChangeCB change_cb;
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeAbortCB abort_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
@@ -477,7 +482,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +589,71 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Commit Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+ <title>Rollback Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>abort_prepared_cb</function> callback is called whenever
+ a rollback prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort">
+ <title>Transaction Abort Callback</title>
+
+ <para>
+ The required <function>abort_cb</function> callback is called whenever
+ a transaction abort has to be initiated. This can happen if we are
+ decoding a transaction that has been prepared for two-phase commit and
+ a concurrent rollback happens while we are decoding it.
+<programlisting>
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +663,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +746,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +800,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d..63d5acf 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -70,6 +70,9 @@ static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -312,17 +315,34 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->options.enable_twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -647,9 +667,82 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
buf->origptr, buf->endptr);
}
+ /*
+ * Decide if we're processing COMMIT PREPARED, or a regular COMMIT.
+ * Regular commit simply triggers a replay of transaction changes from the
+ * reorder buffer. For COMMIT PREPARED that however already happened at
+ * PREPARE time, and so we only need to notify the subscriber that the GID
+ * finally committed.
+ *
+ * For output plugins that do not support PREPARE-time decoding of
+ * two-phase transactions, we never even see the PREPARE and all two-phase
+ * transactions simply fall through to the second branch.
+ */
+ if (TransactionIdIsValid(parsed->twophase_xid) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder,
+ parsed->twophase_xid, parsed->twophase_gid))
+ {
+ Assert(xid == parsed->twophase_xid);
+ /* we are processing COMMIT PREPARED */
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+ else
+ {
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ /*
+ * Process invalidation messages, even if we're not interested in the
+ * transaction's contents, since the various caches need to always be
+ * consistent.
+ */
+ if (parsed->nmsgs > 0)
+ {
+ if (!ctx->fast_forward)
+ ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
+ parsed->nmsgs, parsed->msgs);
+ ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+ }
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
/* replay actions of all transaction + subtransactions in order */
- ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
- commit_time, origin_id, origin_lsn);
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
}
/*
@@ -661,6 +754,30 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid)
{
int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
for (i = 0; i < parsed->nsubxacts; i++)
{
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 0f6af95..0ae5da7 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -58,6 +58,16 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static void abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -206,6 +216,11 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->abort = abort_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->abort_prepared = abort_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -782,6 +797,140 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort";
+ state.report_location = txn->final_lsn; /* beginning of abort record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then prepare callback is mandatory */
+ if (ctx->options.enable_twophase && ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then commit prepared callback is mandatory */
+ if (ctx->options.enable_twophase && ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then abort prepared callback is mandatory */
+ if (ctx->options.enable_twophase && ctx->callbacks.abort_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register abort_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -858,6 +1007,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of twophase at PREPARE time is not enabled. In that
+ * case all twophase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->options.enable_twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 1975d62..d6556be 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -413,6 +413,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
if (txn->tuplecid_hash != NULL)
{
@@ -1987,7 +1992,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2249,7 +2254,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
break;
}
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
@@ -2278,7 +2282,24 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call abort/commit/prepare callback, depending on the transaction
+ * state.
+ *
+ * If the transaction aborted during apply (which currently can happen
+ * only for prepared transactions), simply call the abort callback.
+ *
+ * Otherwise call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
+ else if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2361,8 +2382,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This error can only occur when we are sending the data in
* streaming mode and the streaming is not finished yet.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2370,10 +2391,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream remaining data. */
+ if (streaming && stream_started)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ rb->abort(rb, txn, commit_lsn);
+ }
}
else
{
@@ -2395,23 +2425,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2453,6 +2476,140 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a twophase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ABORT PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->abort_prepared(rb, txn, commit_lsn);
+ }
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2495,7 +2652,13 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
+
ReorderBufferCleanupTXN(rb, txn);
}
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..f6ca87f 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -27,6 +27,7 @@ typedef struct OutputPluginOptions
{
OutputPluginOutputType output_type;
bool receive_rewrites;
+ bool enable_twophase;
} OutputPluginOptions;
/*
@@ -78,6 +79,46 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr commit_lsn);
/*
+ * Called for an implicit ABORT of a transaction.
+ */
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/abort_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+/*
* Called for the generic logical decoding messages.
*/
typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
@@ -170,7 +211,12 @@ typedef struct OutputPluginCallbacks
LogicalDecodeChangeCB change_cb;
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
+ LogicalDecodeAbortCB abort_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1ae17d5..820840a 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -162,9 +163,14 @@ typedef struct ReorderBufferChange
#define RBTXN_HAS_CATALOG_CHANGES 0x0001
#define RBTXN_IS_SUBXACT 0x0002
#define RBTXN_IS_SERIALIZED 0x0004
-#define RBTXN_IS_STREAMED 0x0008
-#define RBTXN_HAS_TOAST_INSERT 0x0010
-#define RBTXN_HAS_SPEC_INSERT 0x0020
+#define RBTXN_PREPARE 0x0008
+#define RBTXN_COMMIT_PREPARED 0x0010
+#define RBTXN_ROLLBACK_PREPARED 0x0020
+#define RBTXN_COMMIT 0x0040
+#define RBTXN_ROLLBACK 0x0080
+#define RBTXN_IS_STREAMED 0x0100
+#define RBTXN_HAS_TOAST_INSERT 0x0200
+#define RBTXN_HAS_SPEC_INSERT 0x0400
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -218,6 +224,17 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+/* was this txn committed in the meanwhile? */
+#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback(txn) (txn->txn_flags & RBTXN_ROLLBACK)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -229,6 +246,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -390,6 +410,39 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+
+
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -482,6 +535,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferAbortPreparedCB abort_prepared;
ReorderBufferMessageCB message;
/*
@@ -548,6 +606,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -571,6 +634,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
0002-Tap-test-to-test-concurrent-aborts-during-2-phase-co.patchapplication/octet-stream; name=0002-Tap-test-to-test-concurrent-aborts-during-2-phase-co.patchDownload
From 96b856e08857a6eda57bebdd6f656e3aff7aa933 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 15 Sep 2020 07:41:38 -0400
Subject: [PATCH 2/3] Tap test to test concurrent aborts during 2 phase commits
This test is specifically for testing concurrent abort while logical decode
is ongoing. Pass in the xid of the 2PC to the plugin as an option.
On the receipt of a valid "check-xid", the change API in the test decoding
plugin will wait for it to be aborted.
---
contrib/test_decoding/Makefile | 2 +
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++++++++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index f23f15b..4905a0a 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -9,6 +9,8 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..99a9249
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
--
1.8.3.1
0003-pgoutput-output-plugin-support-for-logical-decoding-.patchapplication/octet-stream; name=0003-pgoutput-output-plugin-support-for-logical-decoding-.patchDownload
From decf87e1595f9c7bc9580f7959ee40c38cd0fdfa Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 15 Sep 2020 07:43:07 -0400
Subject: [PATCH 3/3] pgoutput output plugin support for logical decoding of
2pc
---
src/backend/access/transam/twophase.c | 31 ++++++
src/backend/replication/logical/proto.c | 90 ++++++++++++++-
src/backend/replication/logical/worker.c | 147 ++++++++++++++++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 72 +++++++++++-
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 37 ++++++-
src/test/subscription/t/020_twophase.pl | 163 ++++++++++++++++++++++++++++
7 files changed, 532 insertions(+), 9 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index ef4f998..bed87d5 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,37 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
+
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (!gxact->valid)
+ continue;
+ if (strcmp(gxact->gid, gid) != 0)
+ continue;
+
+ LWLockRelease(TwoPhaseStateLock);
+
+ return true;
+ }
+
+ LWLockRelease(TwoPhaseStateLock);
+
+ return false;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..291ed10 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -72,12 +72,17 @@ logicalrep_read_begin(StringInfo in, LogicalRepBeginData *begin_data)
*/
void
logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, bool is_commit)
{
uint8 flags = 0;
pq_sendbyte(out, 'C'); /* sending COMMIT */
+ if (is_commit)
+ flags |= LOGICALREP_IS_COMMIT;
+ else
+ flags |= LOGICALREP_IS_ABORT;
+
/* send the flags field (unused for now) */
pq_sendbyte(out, flags);
@@ -88,16 +93,20 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
}
/*
- * Read transaction COMMIT from the stream.
+ * Read transaction COMMIT|ABORT from the stream.
*/
void
logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
{
- /* read flags (unused for now) */
+ /* read flags */
uint8 flags = pq_getmsgbyte(in);
- if (flags != 0)
- elog(ERROR, "unrecognized flags %u in commit message", flags);
+ if (!CommitFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in commit|abort message",
+ flags);
+
+ /* the flag is either commit or abort */
+ commit_data->is_commit = (flags == LOGICALREP_IS_COMMIT);
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
@@ -106,6 +115,77 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for 2PC transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(strlen(txn->gid) > 0);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags |= LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags |= LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index c37aafe..d8944a8 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -729,7 +729,11 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_lsn = commit_data.end_lsn;
replorigin_session_origin_timestamp = commit_data.committime;
- CommitTransactionCommand();
+ if (commit_data.is_commit)
+ CommitTransactionCommand();
+ else
+ AbortCurrentTransaction();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -749,6 +753,141 @@ apply_handle_commit(StringInfo s)
pgstat_report_activity(STATE_IDLE, NULL);
}
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
/*
* Handle ORIGIN message.
*
@@ -1909,10 +2048,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 343f031..3dc5264 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -39,6 +39,8 @@ static void pgoutput_begin_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void pgoutput_abort_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr abort_lsn);
static void pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, Relation rel,
ReorderBufferChange *change);
@@ -47,6 +49,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -143,6 +151,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+ cb->abort_cb = pgoutput_abort_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->abort_prepared_cb = pgoutput_abort_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -256,6 +269,7 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
/* This plugin uses binary protocol. */
opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+ opt->enable_twophase = true;
/*
* This is replication start and not slot initialization.
@@ -373,7 +387,63 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginUpdateProgress(ctx);
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit(ctx->out, txn, commit_lsn, true);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * ABORT callback
+ */
+static void
+pgoutput_abort_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_commit(ctx->out, txn, abort_lsn, false);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);
}
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 607a728..fb07580 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -85,20 +85,55 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
+ bool is_commit;
XLogRecPtr commit_lsn;
XLogRecPtr end_lsn;
TimestampTz committime;
} LogicalRepCommitData;
+/* types of the commit protocol message */
+#define LOGICALREP_IS_COMMIT 0x01
+#define LOGICALREP_IS_ABORT 0x02
+
+/* commit message is COMMIT or ABORT, and there is nothing else */
+#define CommitFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_COMMIT) || (flags == LOGICALREP_IS_ABORT))
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ABORT] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_PREPARE) || \
+ (flags == LOGICALREP_IS_COMMIT_PREPARED) || \
+ (flags == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn, bool is_commit);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..c7f373d
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,163 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+ ));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf(
+ 'postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+"ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2"
+);
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (11);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 11;");
+ is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is committed on subscriber');
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (12);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 12;");
+ is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is aborted on subscriber');
+
+# Check that commit prepared is decoded properly on crash restart
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a IN (11,12);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
On Wed, Sep 9, 2020 at 3:33 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Mon, Sep 7, 2020 at 11:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Nikhil has a test for the same
(0004-Teach-test_decoding-plugin-to-work-with-2PC.Jan4) in his last
email [1]. You might want to use it to test this behavior. I think you
can also keep the tests as a separate patch as Nikhil had.Done. I've added the tests and also tweaked code to make sure that the aborts during 2 phase commits are also handled.
I don't think it is complete yet.
*
* This error can only occur when we are sending the data in
* streaming mode and the streaming is not finished yet.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
Here, you have updated the code but comments are still not updated.
*
@@ -2370,10 +2391,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream
remaining data. */
+ if (streaming && stream_started)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ rb->abort(rb, txn, commit_lsn);
+ }
I don't think we need to perform abort here. Later we will anyway
encounter the WAL for Rollback Prepared for which we will call
abort_prepared_cb. As we have set the 'concurrent_abort' flag, it will
allow us to skip all the intermediate records. Here, we need only
enough state in ReorderBufferTxn that it can be later used for
ReorderBufferFinishPrepared(). Basically, you need functionality
similar to ReorderBufferTruncateTXN where except for invalidations you
can free memory for everything else. You can either write a new
function ReorderBufferTruncatePreparedTxn or pass another bool
parameter in ReorderBufferTruncateTXN to indicate it is prepared_xact
and then clean up additional things that are not required for prepared
xact.
*
Similarly, I don't understand why we need below code:
ReorderBufferProcessTXN()
{
..
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
..
}
There is nowhere we are setting the RBTXN_ROLLBACK flag, so how will
this check be true? If we decide to remove this code then don't forget
to update the comments.
*
If my previous two comments are correct then I don't think we need the
below interface.
+ <sect3 id="logicaldecoding-output-plugin-abort">
+ <title>Transaction Abort Callback</title>
+
+ <para>
+ The required <function>abort_cb</function> callback is called whenever
+ a transaction abort has to be initiated. This can happen if we are
+ decoding a transaction that has been prepared for two-phase commit and
+ a concurrent rollback happens while we are decoding it.
+<programlisting>
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
I don't know why the patch has used this way to implement an option to
enable two-phase. Can't we use how we implement 'stream-changes'
option in commit 7259736a6e? Just refer how we set ctx->streaming and
you can use a similar way to set this parameter.Done, I've moved the checks for callbacks to inside the corresponding wrappers.
This is not what I suggested. Please study the commit 7259736a6e and
see how streaming option is implemented. I want later subscribers can
specify whether they want transactions to be decoded at prepare time
similar to what we have done for streaming. Also, search for
ctx->streaming in the code and see how it is set to get the idea.
Note: Please use version number while sending patches, you can use
something like git format-patch -N -v n to do that. It makes easier
for the reviewer to compare it with the previous version.
Few other comments:
===================
1.
ReorderBufferProcessTXN()
{
..
if (streaming)
{
ReorderBufferTruncateTXN(rb, txn);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
else
ReorderBufferCleanupTXN(rb, txn);
..
}
I don't think we can perform ReorderBufferCleanupTXN for the prepared
transactions because if we have removed the ReorderBufferTxn before
commit, the later code might not consider such a transaction in the
system and compute the wrong value of restart_lsn for a slot.
Basically, in SnapBuildProcessRunningXacts() when we call
ReorderBufferGetOldestTXN(), it should show the ReorderBufferTxn of
the prepared transaction which is not yet committed but because we
have removed it after prepare, it won't get that TXN and then that
leads to wrong computation of restart_lsn. Once we start from a wrong
point in WAL, the snapshot built was incorrect which will lead to the
wrong result. This is the same reason why the patch is not doing
ReorderBufferForget in DecodePrepare when we decide to skip the
transaction. Also, here, we need to set CheckXidAlive =
InvalidTransactionId; for prepared xact as well.
2. Have you thought about the interaction of streaming with prepared
transactions? You can try writing some tests using pg_logical* APIs
and see the behaviour. For ex. there is no handling in
ReorderBufferStreamCommit for the same. I think you need to introduce
stream_prepare API similar to stream_commit and then use the same.
3.
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2249,7 +2254,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
break;
}
}
-
/*
Spurious line removal.
--
With Regards,
Amit Kapila.
On Tue, Sep 15, 2020 at 5:27 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Sat, Sep 12, 2020 at 9:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Another thing, I noticed is that originally we have subscriber-side
support as well, see [1] (see *pgoutput* patch) but later dropped it
due to some reasons [2]. I think we should have pgoutput support as
well, so see what is required to get that incorporated.I have added the rebased patch-set for pgoutput and subscriber side changes as well. This also includes a test case in subscriber.
As mentioned in my email there were some reasons due to which the
support has been left for later, have you checked those and if so, can
you please explain how you have addressed those or why they are not
relevant now if that is the case?
--
With Regards,
Amit Kapila.
On Tue, Sep 15, 2020 at 10:43 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
Few other comments:
===================
1.
ReorderBufferProcessTXN()
{
..
if (streaming)
{
ReorderBufferTruncateTXN(rb, txn);/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
else
ReorderBufferCleanupTXN(rb, txn);
..
}I don't think we can perform ReorderBufferCleanupTXN for the prepared
transactions because if we have removed the ReorderBufferTxn before
commit, the later code might not consider such a transaction in the
system and compute the wrong value of restart_lsn for a slot.
Basically, in SnapBuildProcessRunningXacts() when we call
ReorderBufferGetOldestTXN(), it should show the ReorderBufferTxn of
the prepared transaction which is not yet committed but because we
have removed it after prepare, it won't get that TXN and then that
leads to wrong computation of restart_lsn. Once we start from a wrong
point in WAL, the snapshot built was incorrect which will lead to the
wrong result. This is the same reason why the patch is not doing
ReorderBufferForget in DecodePrepare when we decide to skip the
transaction. Also, here, we need to set CheckXidAlive =
InvalidTransactionId; for prepared xact as well.
Just to confirm what you are expecting here. so after we send out the
prepare transaction to the plugin, you are suggesting to NOT do a
ReorderBufferCleanupTXN, but what to do instead?. Are you suggesting to do
what you suggested
as part of concurrent abort handling? Something equivalent
to ReorderBufferTruncateTXN()? remove all changes of the transaction but
keep the invalidations and tuplecids etc? Do you think we should have a new
flag in txn to indicate that this transaction has already been decoded?
(prepare_decoded?) Any other special handling you think is required?
regards,
Ajin Cherian
Fujitsu Australia
On Thu, Sep 17, 2020 at 2:02 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Tue, Sep 15, 2020 at 10:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other comments:
===================
1.
ReorderBufferProcessTXN()
{
..
if (streaming)
{
ReorderBufferTruncateTXN(rb, txn);/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
else
ReorderBufferCleanupTXN(rb, txn);
..
}I don't think we can perform ReorderBufferCleanupTXN for the prepared
transactions because if we have removed the ReorderBufferTxn before
commit, the later code might not consider such a transaction in the
system and compute the wrong value of restart_lsn for a slot.
Basically, in SnapBuildProcessRunningXacts() when we call
ReorderBufferGetOldestTXN(), it should show the ReorderBufferTxn of
the prepared transaction which is not yet committed but because we
have removed it after prepare, it won't get that TXN and then that
leads to wrong computation of restart_lsn. Once we start from a wrong
point in WAL, the snapshot built was incorrect which will lead to the
wrong result. This is the same reason why the patch is not doing
ReorderBufferForget in DecodePrepare when we decide to skip the
transaction. Also, here, we need to set CheckXidAlive =
InvalidTransactionId; for prepared xact as well.Just to confirm what you are expecting here. so after we send out the prepare transaction to the plugin, you are suggesting to NOT do a ReorderBufferCleanupTXN, but what to do instead?. Are you suggesting to do what you suggested
as part of concurrent abort handling?
Yes.
Something equivalent to ReorderBufferTruncateTXN()? remove all changes of the transaction but keep the invalidations and tuplecids etc?
I don't think you don't need tuplecids. I have checked
ReorderBufferFinishPrepared() and that seems to require only
invalidations, check if anything else is required.
Do you think we should have a new flag in txn to indicate that this transaction has already been decoded? (prepare_decoded?)
Yeah, I think that would be better. How about if name the new variable
as cleanup_prepared?
--
With Regards,
Amit Kapila.
On Tue, Sep 15, 2020 at 10:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I don't think it is complete yet. * * This error can only occur when we are sending the data in * streaming mode and the streaming is not finished yet. */ - Assert(streaming); - Assert(stream_started); + Assert(streaming || rbtxn_prepared(txn)); + Assert(stream_started || rbtxn_prepared(txn));Here, you have updated the code but comments are still not updated.
Updated the comments.
I don't think we need to perform abort here. Later we will anyway
encounter the WAL for Rollback Prepared for which we will call
abort_prepared_cb. As we have set the 'concurrent_abort' flag, it will
allow us to skip all the intermediate records. Here, we need only
enough state in ReorderBufferTxn that it can be later used for
ReorderBufferFinishPrepared(). Basically, you need functionality
similar to ReorderBufferTruncateTXN where except for invalidations you
can free memory for everything else. You can either write a new
function ReorderBufferTruncatePreparedTxn or pass another bool
parameter in ReorderBufferTruncateTXN to indicate it is prepared_xact
and then clean up additional things that are not required for prepared
xact.
Added a new parameter to ReorderBufferTruncatePreparedTxn for
prepared transactions and did cleanup of tupulecids as well, I have
left snapshots and transactions.
As a result of this, I also had to create a new function
ReorderBufferCleanupPreparedTXN which will clean up the rest as part
of FinishPrepared handling as we can't call
ReorderBufferCleanupTXN again after this.
*
Similarly, I don't understand why we need below code:
ReorderBufferProcessTXN()
{
..
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
..
}There is nowhere we are setting the RBTXN_ROLLBACK flag, so how will
this check be true? If we decide to remove this code then don't forget
to update the comments.
Removed.
* If my previous two comments are correct then I don't think we need the below interface. + <sect3 id="logicaldecoding-output-plugin-abort"> + <title>Transaction Abort Callback</title> + + <para> + The required <function>abort_cb</function> callback is called whenever + a transaction abort has to be initiated. This can happen if we are + decoding a transaction that has been prepared for two-phase commit and + a concurrent rollback happens while we are decoding it. +<programlisting> +typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx, + ReorderBufferTXN *txn, + XLogRecPtr abort_lsn);
Removed.
I don't know why the patch has used this way to implement an option to
enable two-phase. Can't we use how we implement 'stream-changes'
option in commit 7259736a6e? Just refer how we set ctx->streaming and
you can use a similar way to set this parameter.Done, I've moved the checks for callbacks to inside the corresponding wrappers.
This is not what I suggested. Please study the commit 7259736a6e and
see how streaming option is implemented. I want later subscribers can
specify whether they want transactions to be decoded at prepare time
similar to what we have done for streaming. Also, search for
ctx->streaming in the code and see how it is set to get the idea.
Changed it similar to ctx->streaming logic.
Note: Please use version number while sending patches, you can use
something like git format-patch -N -v n to do that. It makes easier
for the reviewer to compare it with the previous version.
Done.
Few other comments:
===================
1.
ReorderBufferProcessTXN()
{
..
if (streaming)
{
ReorderBufferTruncateTXN(rb, txn);/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
else
ReorderBufferCleanupTXN(rb, txn);
..
}I don't think we can perform ReorderBufferCleanupTXN for the prepared
transactions because if we have removed the ReorderBufferTxn before
commit, the later code might not consider such a transaction in the
system and compute the wrong value of restart_lsn for a slot.
Basically, in SnapBuildProcessRunningXacts() when we call
ReorderBufferGetOldestTXN(), it should show the ReorderBufferTxn of
the prepared transaction which is not yet committed but because we
have removed it after prepare, it won't get that TXN and then that
leads to wrong computation of restart_lsn. Once we start from a wrong
point in WAL, the snapshot built was incorrect which will lead to the
wrong result. This is the same reason why the patch is not doing
ReorderBufferForget in DecodePrepare when we decide to skip the
transaction. Also, here, we need to set CheckXidAlive =
InvalidTransactionId; for prepared xact as well.
Updated as suggested above.
2. Have you thought about the interaction of streaming with prepared
transactions? You can try writing some tests using pg_logical* APIs
and see the behaviour. For ex. there is no handling in
ReorderBufferStreamCommit for the same. I think you need to introduce
stream_prepare API similar to stream_commit and then use the same.
This is pending. I will look at it in the next iteration. Also pending
is the investigation as to why the pgoutput changes were not added
initially.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v4-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchapplication/octet-stream; name=v4-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchDownload
From 2e778c81095acd808079379f7d51c77e5fba9c7a Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 18 Sep 2020 07:56:07 -0400
Subject: [PATCH v4] Tap test to test concurrent aborts during 2 phase commits
This test is specifically for testing concurrent abort while logical decode
is ongoing. Pass in the xid of the 2PC to the plugin as an option.
On the receipt of a valid "check-xid", the change API in the test decoding
plugin will wait for it to be aborted.
---
contrib/test_decoding/Makefile | 2 +
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++++++++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index f23f15b..4905a0a 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -9,6 +9,8 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..a7eb65e
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
--
1.8.3.1
v4-0001-Support-decoding-of-two-phase-transactions.patchapplication/octet-stream; name=v4-0001-Support-decoding-of-two-phase-transactions.patchDownload
From f175d1360bec3646315ccbf5699748448704a38b Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 18 Sep 2020 07:51:49 -0400
Subject: [PATCH v4] Support decoding of two-phase transactions
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes documentation changes.
---
contrib/test_decoding/expected/prepared.out | 185 ++++++++++++--
contrib/test_decoding/sql/prepared.sql | 77 +++++-
contrib/test_decoding/test_decoding.c | 154 ++++++++++++
doc/src/sgml/logicaldecoding.sgml | 110 ++++++++-
src/backend/replication/logical/decode.c | 141 ++++++++++-
src/backend/replication/logical/logical.c | 175 ++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 309 +++++++++++++++++++++---
src/include/replication/logical.h | 5 +
src/include/replication/output_plugin.h | 37 +++
src/include/replication/reorderbuffer.h | 75 +++++-
10 files changed, 1183 insertions(+), 85 deletions(-)
diff --git a/contrib/test_decoding/expected/prepared.out b/contrib/test_decoding/expected/prepared.out
index 46e915d..94fb0c9 100644
--- a/contrib/test_decoding/expected/prepared.out
+++ b/contrib/test_decoding/expected/prepared.out
@@ -6,19 +6,50 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
init
(1 row)
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ PREPARE TRANSACTION 'test_prepared#1'
+(3 rows)
+
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:2
+ COMMIT
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(6 rows)
+
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (4);
-- test prepared xact containing ddl
BEGIN;
@@ -26,45 +57,149 @@ INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
-INSERT INTO test_prepared2 VALUES (7);
-COMMIT PREPARED 'test_prepared#3';
--- make sure stuff still works
-INSERT INTO test_prepared1 VALUES (8);
-INSERT INTO test_prepared2 VALUES (9);
--- cleanup
-DROP TABLE test_prepared1;
-DROP TABLE test_prepared2;
--- show results
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
data
-------------------------------------------------------------------------
BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- COMMIT
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:2
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:4
COMMIT
BEGIN
- table public.test_prepared2: INSERT: id[integer]:7
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:5
table public.test_prepared1: INSERT: id[integer]:6 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(7 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (8);
+INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:8 data[text]:null
COMMIT
BEGIN
table public.test_prepared2: INSERT: id[integer]:9
COMMIT
-(22 rows)
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+ relation | locktype | mode
+----------+----------+------
+(0 rows)
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:10 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:11 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:12
+ PREPARE TRANSACTION 'test_prepared_lock2'
+ COMMIT PREPARED 'test_prepared_lock2'
+(8 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
SELECT pg_drop_replication_slot('regression_slot');
pg_drop_replication_slot
diff --git a/contrib/test_decoding/sql/prepared.sql b/contrib/test_decoding/sql/prepared.sql
index e726397..ca801e4 100644
--- a/contrib/test_decoding/sql/prepared.sql
+++ b/contrib/test_decoding/sql/prepared.sql
@@ -2,21 +2,25 @@
SET synchronous_commit = on;
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (4);
@@ -27,24 +31,83 @@ ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-- cleanup
DROP TABLE test_prepared1;
DROP TABLE test_prepared2;
--- show results
+-- show results. There should be nothing to show
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e60ab34..185a70e 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -88,6 +93,19 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
void
_PG_init(void)
@@ -116,6 +134,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
+
}
@@ -136,6 +159,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid = InvalidTransactionId;
ctx->output_plugin_private = data;
@@ -227,6 +251,32 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "check-xid") == 0)
+ {
+ if (elem->arg)
+ {
+ errno = 0;
+ data->check_xid = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (errno == EINVAL || errno == ERANGE)
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid is not a valid number: \"%s\"",
+ strVal(elem->arg))));
+ }
+ else
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid needs an input value")));
+
+ if (data->check_xid <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("Specify positive value for parameter \"%s\","
+ " you specified \"%s\"",
+ elem->defname, strVal(elem->arg))));
+ }
else
{
ereport(ERROR,
@@ -297,6 +347,94 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +593,22 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid is specified */
+ if (TransactionIdIsValid(data->check_xid))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid);
+ while (TransactionIdIsInProgress(data->check_xid))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid) &&
+ !TransactionIdDidCommit(data->check_xid))
+ elog(LOG, "%u aborted", data->check_xid);
+
+ Assert(TransactionIdDidAbort(data->check_xid));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..bd4542e 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -387,6 +387,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
@@ -477,7 +481,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +588,55 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Commit Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+ <title>Rollback Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>abort_prepared_cb</function> callback is called whenever
+ a rollback prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +646,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +729,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +783,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d..c0b0bce 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -70,6 +70,9 @@ static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -312,17 +315,34 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->enable_twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -647,9 +667,82 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
buf->origptr, buf->endptr);
}
+ /*
+ * Decide if we're processing COMMIT PREPARED, or a regular COMMIT.
+ * Regular commit simply triggers a replay of transaction changes from the
+ * reorder buffer. For COMMIT PREPARED that however already happened at
+ * PREPARE time, and so we only need to notify the subscriber that the GID
+ * finally committed.
+ *
+ * For output plugins that do not support PREPARE-time decoding of
+ * two-phase transactions, we never even see the PREPARE and all two-phase
+ * transactions simply fall through to the second branch.
+ */
+ if (TransactionIdIsValid(parsed->twophase_xid) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder,
+ parsed->twophase_xid, parsed->twophase_gid))
+ {
+ Assert(xid == parsed->twophase_xid);
+ /* we are processing COMMIT PREPARED */
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+ else
+ {
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ /*
+ * Process invalidation messages, even if we're not interested in the
+ * transaction's contents, since the various caches need to always be
+ * consistent.
+ */
+ if (parsed->nmsgs > 0)
+ {
+ if (!ctx->fast_forward)
+ ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
+ parsed->nmsgs, parsed->msgs);
+ ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+ }
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
/* replay actions of all transaction + subtransactions in order */
- ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
- commit_time, origin_id, origin_lsn);
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
}
/*
@@ -661,6 +754,30 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid)
{
int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
for (i = 0; i < parsed->nsubxacts; i++)
{
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 0f6af95..4e95337 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -58,6 +58,14 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -206,6 +214,10 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->abort_prepared = abort_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -225,6 +237,19 @@ StartupDecodingContext(List *output_plugin_options,
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);
+ /*
+ * To support two phase logical decoding, we require prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however enable two phase logical
+ * decoding when at least one of the methods is enabled so that we can easily identify
+ * missing methods.
+ *
+ * We decide it here, but only check it later in the wrappers.
+ */
+ ctx->enable_twophase = (ctx->callbacks.prepare_cb != NULL) ||
+ (ctx->callbacks.commit_prepared_cb != NULL) ||
+ (ctx->callbacks.abort_prepared_cb != NULL) ||
+ (ctx->callbacks.filter_prepare_cb != NULL);
+
/*
* streaming callbacks
*
@@ -782,6 +807,111 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then prepare callback is mandatory */
+ if (ctx->enable_twophase && ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then commit prepared callback is mandatory */
+ if (ctx->enable_twophase && ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then abort prepared callback is mandatory */
+ if (ctx->enable_twophase && ctx->callbacks.abort_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register abort_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -858,6 +988,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of twophase at PREPARE time is not enabled. In that
+ * case all twophase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->enable_twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 1975d62..d96be77 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
static void ReorderBufferCleanupSerializedTXNs(const char *slotname);
static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
TransactionId xid, XLogSegNo segno);
@@ -413,6 +414,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
if (txn->tuplecid_hash != NULL)
{
@@ -1401,6 +1407,59 @@ ReorderBufferIterTXNFinish(ReorderBuffer *rb,
}
/*
+ * Cleanup the leftover contents of a transaction, usually after the transaction
+ * has been COMMIT PREPARED or ROLLBACK PREPARED. This does the rest of the cleanup
+ * that was not done when the transaction was PREPARED
+ */
+static void
+ReorderBufferCleanupPreparedTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ bool found;
+
+ /*
+ * Cleanup the base snapshot, if set.
+ */
+ if (txn->base_snapshot != NULL)
+ {
+ SnapBuildSnapDecRefcount(txn->base_snapshot);
+ dlist_delete(&txn->base_snapshot_node);
+ }
+
+ /*
+ * Cleanup the snapshot for the last streamed run.
+ */
+ if (txn->snapshot_now != NULL)
+ {
+ Assert(rbtxn_is_streamed(txn));
+ ReorderBufferFreeSnap(rb, txn->snapshot_now);
+ }
+
+ /*
+ * Remove TXN from its containing list.
+ *
+ * Note: if txn is known as subxact, we are deleting the TXN from its
+ * parent's list of known subxacts; this leaves the parent's nsubxacts
+ * count too high, but we don't care. Otherwise, we are deleting the TXN
+ * from the LSN-ordered list of toplevel TXNs.
+ */
+ dlist_delete(&txn->node);
+
+ /* now remove reference from buffer */
+ hash_search(rb->by_txn,
+ (void *) &txn->xid,
+ HASH_REMOVE,
+ &found);
+ Assert(found);
+
+ /* remove entries spilled to disk */
+ if (rbtxn_is_serialized(txn))
+ ReorderBufferRestoreCleanup(rb, txn);
+
+ /* deallocate */
+ ReorderBufferReturnTXN(rb, txn);
+}
+
+/*
* Cleanup the contents of a transaction, usually after the transaction
* committed or aborted.
*/
@@ -1502,12 +1561,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either after streaming or
+ * after a PREPARE.
+ * The flag txn_prepared indicates if this is called after a PREPARE.
+ * If streaming, keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots.If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared)
{
dlist_mutable_iter iter;
@@ -1526,7 +1587,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
/* cleanup changes in the toplevel txn */
@@ -1560,9 +1621,30 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
+ {
+ /*
+ * If this is a prepared txn, cleanup the tuplecids we stored for decoding
+ * catalog snapshot access.
+ * They are always stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);
+ Assert(change->action == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ ReorderBufferReturnChange(rb, change, true);
+ }
+ }
+
/*
* Destroy the (relfilenode, ctid) hashtable, so that we don't leak any
* memory. We could also keep the hash table and update it with new ctid
@@ -1880,7 +1962,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
ReorderBufferChange *specinsert)
{
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Free all resources allocated for toast reconstruction */
ReorderBufferToastReset(rb, txn);
@@ -1987,7 +2069,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2249,7 +2331,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
break;
}
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
@@ -2278,7 +2359,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2319,11 +2409,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
+ {
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
else
ReorderBufferCleanupTXN(rb, txn);
}
@@ -2352,17 +2448,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
- * This error can only occur when we are sending the data in
- * streaming mode and the streaming is not finished yet.
+ * This error can only occur either when we are sending the data in
+ * streaming mode and the streaming is not finished yet or when we are
+ * sending the data out on a PREPARE during a twoi phase commit.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2370,10 +2467,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream remaining data. */
+ if (streaming && stream_started)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
}
else
{
@@ -2395,23 +2501,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2453,6 +2552,140 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a twophase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ABORT PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->abort_prepared(rb, txn, commit_lsn);
+ }
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+ ReorderBufferCleanupPreparedTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2495,7 +2728,13 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
+
ReorderBufferCleanupTXN(rb, txn);
}
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 45abc44..ee63e7b 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -84,6 +84,11 @@ typedef struct LogicalDecodingContext
*/
bool streaming;
+ /*
+ * Does the output plugin support two phase decoding, and is it enabled?
+ */
+ bool enable_twophase;
+
/*
* State for writing output.
*/
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..96e269b 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -77,6 +77,39 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/abort_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -171,6 +204,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1ae17d5..4d4e35d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -162,9 +163,13 @@ typedef struct ReorderBufferChange
#define RBTXN_HAS_CATALOG_CHANGES 0x0001
#define RBTXN_IS_SUBXACT 0x0002
#define RBTXN_IS_SERIALIZED 0x0004
-#define RBTXN_IS_STREAMED 0x0008
-#define RBTXN_HAS_TOAST_INSERT 0x0010
-#define RBTXN_HAS_SPEC_INSERT 0x0020
+#define RBTXN_PREPARE 0x0008
+#define RBTXN_COMMIT_PREPARED 0x0010
+#define RBTXN_ROLLBACK_PREPARED 0x0020
+#define RBTXN_COMMIT 0x0040
+#define RBTXN_IS_STREAMED 0x0080
+#define RBTXN_HAS_TOAST_INSERT 0x0100
+#define RBTXN_HAS_SPEC_INSERT 0x0200
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -218,6 +223,15 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+/* was this txn committed in the meanwhile? */
+#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -229,6 +243,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -390,6 +407,39 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+
+
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -482,6 +532,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferAbortPreparedCB abort_prepared;
ReorderBufferMessageCB message;
/*
@@ -548,6 +603,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -571,6 +631,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
v4-0003-pgoutput-output-plugin-support-for-logical-decodi.patchapplication/octet-stream; name=v4-0003-pgoutput-output-plugin-support-for-logical-decodi.patchDownload
From f0cf243ef74b28248d068f43adeed0404fa39fec Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 18 Sep 2020 08:01:14 -0400
Subject: [PATCH v4] pgoutput output plugin support for logical decoding of 2pc
---
src/backend/access/transam/twophase.c | 31 ++++++
src/backend/replication/logical/proto.c | 90 ++++++++++++++-
src/backend/replication/logical/worker.c | 147 ++++++++++++++++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 54 ++++++++-
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 37 ++++++-
src/test/subscription/t/020_twophase.pl | 163 ++++++++++++++++++++++++++++
7 files changed, 514 insertions(+), 9 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index ef4f998..bed87d5 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,37 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
+
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (!gxact->valid)
+ continue;
+ if (strcmp(gxact->gid, gid) != 0)
+ continue;
+
+ LWLockRelease(TwoPhaseStateLock);
+
+ return true;
+ }
+
+ LWLockRelease(TwoPhaseStateLock);
+
+ return false;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..291ed10 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -72,12 +72,17 @@ logicalrep_read_begin(StringInfo in, LogicalRepBeginData *begin_data)
*/
void
logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, bool is_commit)
{
uint8 flags = 0;
pq_sendbyte(out, 'C'); /* sending COMMIT */
+ if (is_commit)
+ flags |= LOGICALREP_IS_COMMIT;
+ else
+ flags |= LOGICALREP_IS_ABORT;
+
/* send the flags field (unused for now) */
pq_sendbyte(out, flags);
@@ -88,16 +93,20 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
}
/*
- * Read transaction COMMIT from the stream.
+ * Read transaction COMMIT|ABORT from the stream.
*/
void
logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
{
- /* read flags (unused for now) */
+ /* read flags */
uint8 flags = pq_getmsgbyte(in);
- if (flags != 0)
- elog(ERROR, "unrecognized flags %u in commit message", flags);
+ if (!CommitFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in commit|abort message",
+ flags);
+
+ /* the flag is either commit or abort */
+ commit_data->is_commit = (flags == LOGICALREP_IS_COMMIT);
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
@@ -106,6 +115,77 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for 2PC transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(strlen(txn->gid) > 0);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags |= LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags |= LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index d239d28..62c571e 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -729,7 +729,11 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_lsn = commit_data.end_lsn;
replorigin_session_origin_timestamp = commit_data.committime;
- CommitTransactionCommand();
+ if (commit_data.is_commit)
+ CommitTransactionCommand();
+ else
+ AbortCurrentTransaction();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -749,6 +753,141 @@ apply_handle_commit(StringInfo s)
pgstat_report_activity(STATE_IDLE, NULL);
}
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
/*
* Handle ORIGIN message.
*
@@ -1909,10 +2048,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index eb1f230..729b655 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -143,6 +149,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->abort_prepared_cb = pgoutput_abort_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -373,7 +383,49 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginUpdateProgress(ctx);
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit(ctx->out, txn, commit_lsn, true);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);
}
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 607a728..fb07580 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -85,20 +85,55 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
+ bool is_commit;
XLogRecPtr commit_lsn;
XLogRecPtr end_lsn;
TimestampTz committime;
} LogicalRepCommitData;
+/* types of the commit protocol message */
+#define LOGICALREP_IS_COMMIT 0x01
+#define LOGICALREP_IS_ABORT 0x02
+
+/* commit message is COMMIT or ABORT, and there is nothing else */
+#define CommitFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_COMMIT) || (flags == LOGICALREP_IS_ABORT))
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ABORT] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_PREPARE) || \
+ (flags == LOGICALREP_IS_COMMIT_PREPARED) || \
+ (flags == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn, bool is_commit);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..c7f373d
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,163 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+ ));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf(
+ 'postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+"ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2"
+);
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (11);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 11;");
+ is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is committed on subscriber');
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (12);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 12;");
+ is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is aborted on subscriber');
+
+# Check that commit prepared is decoded properly on crash restart
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a IN (11,12);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
On Thu, Sep 17, 2020 at 10:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Yeah, I think that would be better. How about if name the new variable
as cleanup_prepared?
I haven't added a new flag to indicate that the prepare was cleaned
up, as that wasn' really necessary. Instead I used a new function to
do partial cleanup to do whatever was not done in the truncate. If you
think, using a flag and doing special handling in
ReorderBufferCleanupTXN was a better idea, let me know.
regards,
Ajin Cherian
Fujitsu Australia
On Fri, Sep 18, 2020 at 6:02 PM Ajin Cherian <itsajin@gmail.com> wrote:
I have reviewed v4-0001 patch and I have a few comments. I haven't
yet completely reviewed the patch.
1.
+ /*
+ * Process invalidation messages, even if we're not interested in the
+ * transaction's contents, since the various caches need to always be
+ * consistent.
+ */
+ if (parsed->nmsgs > 0)
+ {
+ if (!ctx->fast_forward)
+ ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
+ parsed->nmsgs, parsed->msgs);
+ ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+ }
+
I think we don't need to add prepare time invalidation messages as we now we
are already logging the invalidations at the command level and adding them to
reorder buffer.
2.
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
Do we need to call ReorderBufferCommitChild if we are skiping this transaction?
I think the below check should be before calling ReorderBufferCommitChild.
3.
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
I think we have already checked !SnapBuildXactNeedsSkip, parsed->dbId
== ctx->slot->data.database and !FilterByOrigin in DecodePrepare
so if those are not true then we wouldn't have prepared this
transaction i.e. ReorderBufferTxnIsPrepared will be false so why do we
need
to recheck this conditions.
4.
+ /* If streaming, reset the TXN so that it is allowed to stream
remaining data. */
+ if (streaming && stream_started)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
Why only if (streaming) is not enough? I agree if we are coming here
and it is streaming mode then streaming started must be true
but we already have an assert for that.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Sep 18, 2020 at 6:02 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Tue, Sep 15, 2020 at 10:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I don't think it is complete yet. * * This error can only occur when we are sending the data in * streaming mode and the streaming is not finished yet. */ - Assert(streaming); - Assert(stream_started); + Assert(streaming || rbtxn_prepared(txn)); + Assert(stream_started || rbtxn_prepared(txn));Here, you have updated the code but comments are still not updated.
Updated the comments.
I don't think we need to perform abort here. Later we will anyway
encounter the WAL for Rollback Prepared for which we will call
abort_prepared_cb. As we have set the 'concurrent_abort' flag, it will
allow us to skip all the intermediate records. Here, we need only
enough state in ReorderBufferTxn that it can be later used for
ReorderBufferFinishPrepared(). Basically, you need functionality
similar to ReorderBufferTruncateTXN where except for invalidations you
can free memory for everything else. You can either write a new
function ReorderBufferTruncatePreparedTxn or pass another bool
parameter in ReorderBufferTruncateTXN to indicate it is prepared_xact
and then clean up additional things that are not required for prepared
xact.Added a new parameter to ReorderBufferTruncatePreparedTxn for
prepared transactions and did cleanup of tupulecids as well, I have
left snapshots and transactions.
As a result of this, I also had to create a new function
ReorderBufferCleanupPreparedTXN which will clean up the rest as part
of FinishPrepared handling as we can't call
ReorderBufferCleanupTXN again after this.
Why can't we call ReorderBufferCleanupTXN() from
ReorderBufferFinishPrepared after your changes?
+ * If streaming, keep the remaining info - transactions, tuplecids,
invalidations and
+ * snapshots.If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
Why do we need even snapshot for Prepared transactions? Also, note
that in the comment there is no space before you start a new line.
I don't know why the patch has used this way to implement an option to
enable two-phase. Can't we use how we implement 'stream-changes'
option in commit 7259736a6e? Just refer how we set ctx->streaming and
you can use a similar way to set this parameter.Done, I've moved the checks for callbacks to inside the corresponding wrappers.
This is not what I suggested. Please study the commit 7259736a6e and
see how streaming option is implemented. I want later subscribers can
specify whether they want transactions to be decoded at prepare time
similar to what we have done for streaming. Also, search for
ctx->streaming in the code and see how it is set to get the idea.Changed it similar to ctx->streaming logic.
Hmm, I still don't see changes relevant changes in pg_decode_startup().
--
With Regards,
Amit Kapila.
On Sun, Sep 20, 2020 at 11:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Sep 18, 2020 at 6:02 PM Ajin Cherian <itsajin@gmail.com> wrote:
3.
+ /* + * If it's ROLLBACK PREPARED then handle it via callbacks. + */ + if (TransactionIdIsValid(xid) && + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && + parsed->dbId == ctx->slot->data.database && + !FilterByOrigin(ctx, origin_id) && + ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid)) + { + ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr, + commit_time, origin_id, origin_lsn, + parsed->twophase_gid, false); + return; + }I think we have already checked !SnapBuildXactNeedsSkip, parsed->dbId
== ctx->slot->data.database and !FilterByOrigin in DecodePrepare
so if those are not true then we wouldn't have prepared this
transaction i.e. ReorderBufferTxnIsPrepared will be false so why do we
need
to recheck this conditions.
Yeah, probably we should have Assert for below three conditions:
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
Your other comments make sense to me.
--
With Regards,
Amit Kapila.
Why can't we call ReorderBufferCleanupTXN() from
ReorderBufferFinishPrepared after your changes?
Since the truncate already removed the changes, it would fail on the
below Assert in ReorderBufferCleanupTXN()
/* Check we're not mixing changes from different transactions. */
Assert(change->txn == txn);
regards.
Ajin Cherian
Fujitsu Australia
On Mon, Sep 21, 2020 at 12:36 PM Ajin Cherian <itsajin@gmail.com> wrote:
Why can't we call ReorderBufferCleanupTXN() from
ReorderBufferFinishPrepared after your changes?Since the truncate already removed the changes, it would fail on the
below Assert in ReorderBufferCleanupTXN()
/* Check we're not mixing changes from different transactions. */
Assert(change->txn == txn);
The changes list should be empty by that time because we removing each
change from the list:, see code "dlist_delete(&change->node);" in
ReorderBufferTruncateTXN. If you are hitting the Assert as you
mentioned then I think the problem is something else.
--
With Regards,
Amit Kapila.
On Mon, Sep 21, 2020 at 10:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Sep 20, 2020 at 11:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Sep 18, 2020 at 6:02 PM Ajin Cherian <itsajin@gmail.com> wrote:
3.
+ /* + * If it's ROLLBACK PREPARED then handle it via callbacks. + */ + if (TransactionIdIsValid(xid) && + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && + parsed->dbId == ctx->slot->data.database && + !FilterByOrigin(ctx, origin_id) && + ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid)) + { + ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr, + commit_time, origin_id, origin_lsn, + parsed->twophase_gid, false); + return; + }I think we have already checked !SnapBuildXactNeedsSkip, parsed->dbId
== ctx->slot->data.database and !FilterByOrigin in DecodePrepare
so if those are not true then we wouldn't have prepared this
transaction i.e. ReorderBufferTxnIsPrepared will be false so why do we
need
to recheck this conditions.Yeah, probably we should have Assert for below three conditions: + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && + parsed->dbId == ctx->slot->data.database && + !FilterByOrigin(ctx, origin_id) &&
+1
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Sun, Sep 20, 2020 at 3:31 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
+ /* + * If it's ROLLBACK PREPARED then handle it via callbacks. + */ + if (TransactionIdIsValid(xid) && + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && + parsed->dbId == ctx->slot->data.database && + !FilterByOrigin(ctx, origin_id) && + ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid)) + { + ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr, + commit_time, origin_id, origin_lsn, + parsed->twophase_gid, false); + return; + }I think we have already checked !SnapBuildXactNeedsSkip, parsed->dbId
== ctx->slot->data.database and !FilterByOrigin in DecodePrepare
so if those are not true then we wouldn't have prepared this
transaction i.e. ReorderBufferTxnIsPrepared will be false so why do we
need
to recheck this conditions.
We could enter DecodeAbort even without a prepare, as the code is
common for both XLOG_XACT_ABORT and XLOG_XACT_ABORT_PREPARED. So, the
conditions !SnapBuildXactNeedsSkip, parsed->dbId
== ctx->slot->data.database and !FilterByOrigin could be true but the transaction is not prepared, then we dont need to do a ReorderBufferFinishPrepared (with commit flag false) but called ReorderBufferAbort. But I think there is a problem, if those conditions are in fact false, then we should return without trying to Abort using ReorderBufferAbort, what do you think?
I agree with all your other comments.
regards,
Ajin
Fujitsu Australia
On Mon, Sep 21, 2020 at 3:45 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Sun, Sep 20, 2020 at 3:31 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
+ /* + * If it's ROLLBACK PREPARED then handle it via callbacks. + */ + if (TransactionIdIsValid(xid) && + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && + parsed->dbId == ctx->slot->data.database && + !FilterByOrigin(ctx, origin_id) && + ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid)) + { + ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr, + commit_time, origin_id, origin_lsn, + parsed->twophase_gid, false); + return; + }I think we have already checked !SnapBuildXactNeedsSkip, parsed->dbId
== ctx->slot->data.database and !FilterByOrigin in DecodePrepare
so if those are not true then we wouldn't have prepared this
transaction i.e. ReorderBufferTxnIsPrepared will be false so why do we
need
to recheck this conditions.We could enter DecodeAbort even without a prepare, as the code is
common for both XLOG_XACT_ABORT and XLOG_XACT_ABORT_PREPARED. So, the
conditions !SnapBuildXactNeedsSkip, parsed->dbId== ctx->slot->data.database and !FilterByOrigin could be true but the transaction is not prepared, then we dont need to do a ReorderBufferFinishPrepared (with commit flag false) but called ReorderBufferAbort. But I think there is a problem, if those conditions are in fact false, then we should return without trying to Abort using ReorderBufferAbort, what do you think?
I think we need to call ReorderBufferAbort at least to clean up the
TXN. Also, if what you are saying is correct then that should be true
without this patch as well, no? If so, we don't need to worry about it
as far as this patch is concerned.
--
With Regards,
Amit Kapila.
On Mon, Sep 21, 2020 at 9:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I think we need to call ReorderBufferAbort at least to clean up the
TXN. Also, if what you are saying is correct then that should be true
without this patch as well, no? If so, we don't need to worry about it
as far as this patch is concerned.
Yes, that is true. So will change this check to:
if (TransactionIdIsValid(xid) &&
ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid)
regards,
Ajin Cherian
Fujitsu Australia
On Mon, Sep 21, 2020 at 5:23 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Mon, Sep 21, 2020 at 9:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I think we need to call ReorderBufferAbort at least to clean up the
TXN. Also, if what you are saying is correct then that should be true
without this patch as well, no? If so, we don't need to worry about it
as far as this patch is concerned.Yes, that is true. So will change this check to:
if (TransactionIdIsValid(xid) &&
ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid)
Yeah and add the Assert for skip conditions as asked above.
--
With Regards,
Amit Kapila.
On Sun, Sep 20, 2020 at 3:31 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
1. + /* + * Process invalidation messages, even if we're not interested in the + * transaction's contents, since the various caches need to always be + * consistent. + */ + if (parsed->nmsgs > 0) + { + if (!ctx->fast_forward) + ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr, + parsed->nmsgs, parsed->msgs); + ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr); + } +I think we don't need to add prepare time invalidation messages as we now we
are already logging the invalidations at the command level and adding them to
reorder buffer.
Removed.
2.
+ /* + * Tell the reorderbuffer about the surviving subtransactions. We need to + * do this because the main transaction itself has not committed since we + * are in the prepare phase right now. So we need to be sure the snapshot + * is setup correctly for the main transaction in case all changes + * happened in subtransanctions + */ + for (i = 0; i < parsed->nsubxacts; i++) + { + ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i], + buf->origptr, buf->endptr); + } + + if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) || + (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) || + ctx->fast_forward || FilterByOrigin(ctx, origin_id)) + return;Do we need to call ReorderBufferCommitChild if we are skiping this transaction?
I think the below check should be before calling ReorderBufferCommitChild.
Done.
3.
+ /* + * If it's ROLLBACK PREPARED then handle it via callbacks. + */ + if (TransactionIdIsValid(xid) && + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && + parsed->dbId == ctx->slot->data.database && + !FilterByOrigin(ctx, origin_id) && + ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid)) + { + ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr, + commit_time, origin_id, origin_lsn, + parsed->twophase_gid, false); + return; + }I think we have already checked !SnapBuildXactNeedsSkip, parsed->dbId
== ctx->slot->data.database and !FilterByOrigin in DecodePrepare
so if those are not true then we wouldn't have prepared this
transaction i.e. ReorderBufferTxnIsPrepared will be false so why do we
need
to recheck this conditions.
I didnt change this, as I am seeing cases where the Abort is getting
called for transactions that needs to be skipped. I also see that the
same check is there both in DecodePrepare and DecodeCommit.
So, while the same transactions were not getting prepared or
committed, it tries to get ROLLBACK PREPARED (as part of finish
prepared handling). The check in if ReorderBufferTxnIsPrepared() is
also not proper. I will need to relook
this logic again in a future patch.
4.
+ /* If streaming, reset the TXN so that it is allowed to stream remaining data. */ + if (streaming && stream_started) + { + ReorderBufferResetTXN(rb, txn, snapshot_now, + command_id, prev_lsn, + specinsert); + } + else + { + elog(LOG, "stopping decoding of %s (%u)", + txn->gid[0] != '\0'? txn->gid:"", txn->xid); + ReorderBufferTruncateTXN(rb, txn, true); + }Why only if (streaming) is not enough? I agree if we are coming here
and it is streaming mode then streaming started must be true
but we already have an assert for that.
Changed.
Amit,
I have also changed test_decode startup to support two_phase commits
only if specified similar to how it was done for streaming. I have
also changed the test cases accordingly. However, I have not added it
to the pgoutput startup as that would require create subscription
changes. I will do that in a future patch. Some other pending changes
are:
1. Remove snapshots on prepare truncate.
2. Look at why ReorderBufferCleanupTXN is failing after a
ReorderBufferTruncateTXN
3. Add prepare support to streaming
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v5-0001-Support-decoding-of-two-phase-transactions.patchapplication/octet-stream; name=v5-0001-Support-decoding-of-two-phase-transactions.patchDownload
From 996e9dbc78f61729cfb1295363620cbc95e974cc Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 22 Sep 2020 06:17:43 -0400
Subject: [PATCH v5] Support decoding of two-phase transactions
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes documentation changes.
---
contrib/test_decoding/expected/prepared.out | 187 ++++++++++++--
contrib/test_decoding/sql/prepared.sql | 79 +++++-
contrib/test_decoding/test_decoding.c | 166 +++++++++++++
doc/src/sgml/logicaldecoding.sgml | 110 ++++++++-
src/backend/replication/logical/decode.c | 129 +++++++++-
src/backend/replication/logical/logical.c | 175 ++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 309 +++++++++++++++++++++---
src/include/replication/logical.h | 5 +
src/include/replication/output_plugin.h | 37 +++
src/include/replication/reorderbuffer.h | 75 +++++-
10 files changed, 1185 insertions(+), 87 deletions(-)
diff --git a/contrib/test_decoding/expected/prepared.out b/contrib/test_decoding/expected/prepared.out
index 46e915d..fd0e8a4 100644
--- a/contrib/test_decoding/expected/prepared.out
+++ b/contrib/test_decoding/expected/prepared.out
@@ -6,19 +6,50 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
init
(1 row)
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ PREPARE TRANSACTION 'test_prepared#1'
+(3 rows)
+
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:2
+ COMMIT
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(6 rows)
+
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (4);
-- test prepared xact containing ddl
BEGIN;
@@ -26,45 +57,149 @@ INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
-INSERT INTO test_prepared2 VALUES (7);
-COMMIT PREPARED 'test_prepared#3';
--- make sure stuff still works
-INSERT INTO test_prepared1 VALUES (8);
-INSERT INTO test_prepared2 VALUES (9);
--- cleanup
-DROP TABLE test_prepared1;
-DROP TABLE test_prepared2;
--- show results
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
data
-------------------------------------------------------------------------
BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- COMMIT
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:2
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:4
COMMIT
BEGIN
- table public.test_prepared2: INSERT: id[integer]:7
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:5
table public.test_prepared1: INSERT: id[integer]:6 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(7 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (8);
+INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:8 data[text]:null
COMMIT
BEGIN
table public.test_prepared2: INSERT: id[integer]:9
COMMIT
-(22 rows)
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+ relation | locktype | mode
+----------+----------+------
+(0 rows)
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:10 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:11 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:12
+ PREPARE TRANSACTION 'test_prepared_lock2'
+ COMMIT PREPARED 'test_prepared_lock2'
+(8 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
SELECT pg_drop_replication_slot('regression_slot');
pg_drop_replication_slot
diff --git a/contrib/test_decoding/sql/prepared.sql b/contrib/test_decoding/sql/prepared.sql
index e726397..162fe43 100644
--- a/contrib/test_decoding/sql/prepared.sql
+++ b/contrib/test_decoding/sql/prepared.sql
@@ -2,21 +2,25 @@
SET synchronous_commit = on;
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (4);
@@ -27,24 +31,83 @@ ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-- cleanup
DROP TABLE test_prepared1;
DROP TABLE test_prepared2;
--- show results
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e60ab34..1bb17a6 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -88,6 +93,19 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
void
_PG_init(void)
@@ -116,6 +134,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
+
}
@@ -127,6 +150,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
ListCell *option;
TestDecodingData *data;
bool enable_streaming = false;
+ bool enable_2pc = false;
data = palloc0(sizeof(TestDecodingData));
data->context = AllocSetContextCreate(ctx->context,
@@ -136,6 +160,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid = InvalidTransactionId;
ctx->output_plugin_private = data;
@@ -227,6 +252,42 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
+ else if (!parse_bool(strVal(elem->arg), &enable_2pc))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse value \"%s\" for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+ }
+ else if (strcmp(elem->defname, "check-xid") == 0)
+ {
+ if (elem->arg)
+ {
+ errno = 0;
+ data->check_xid = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (errno == EINVAL || errno == ERANGE)
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid is not a valid number: \"%s\"",
+ strVal(elem->arg))));
+ }
+ else
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid needs an input value")));
+
+ if (data->check_xid <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("Specify positive value for parameter \"%s\","
+ " you specified \"%s\"",
+ elem->defname, strVal(elem->arg))));
+ }
else
{
ereport(ERROR,
@@ -238,6 +299,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->enable_twophase &= enable_2pc;
}
/* cleanup this plugin's resources */
@@ -297,6 +359,94 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +605,22 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid is specified */
+ if (TransactionIdIsValid(data->check_xid))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid);
+ while (TransactionIdIsInProgress(data->check_xid))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid) &&
+ !TransactionIdDidCommit(data->check_xid))
+ elog(LOG, "%u aborted", data->check_xid);
+
+ Assert(TransactionIdDidAbort(data->check_xid));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..bd4542e 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -387,6 +387,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
@@ -477,7 +481,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +588,55 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Commit Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+ <title>Rollback Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>abort_prepared_cb</function> callback is called whenever
+ a rollback prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +646,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +729,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +783,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d..b37b62d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -70,6 +70,9 @@ static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -312,17 +315,34 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->enable_twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -647,9 +667,69 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
buf->origptr, buf->endptr);
}
+ /*
+ * Decide if we're processing COMMIT PREPARED, or a regular COMMIT.
+ * Regular commit simply triggers a replay of transaction changes from the
+ * reorder buffer. For COMMIT PREPARED that however already happened at
+ * PREPARE time, and so we only need to notify the subscriber that the GID
+ * finally committed.
+ *
+ * For output plugins that do not support PREPARE-time decoding of
+ * two-phase transactions, we never even see the PREPARE and all two-phase
+ * transactions simply fall through to the second branch.
+ */
+ if (TransactionIdIsValid(parsed->twophase_xid) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder,
+ parsed->twophase_xid, parsed->twophase_gid))
+ {
+ Assert(xid == parsed->twophase_xid);
+ /* we are processing COMMIT PREPARED */
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+ else
+ {
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
/* replay actions of all transaction + subtransactions in order */
- ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
- commit_time, origin_id, origin_lsn);
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
}
/*
@@ -661,6 +741,31 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid)
{
int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid))
+ {
+
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
for (i = 0; i < parsed->nsubxacts; i++)
{
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 0f6af95..4e95337 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -58,6 +58,14 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -206,6 +214,10 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->abort_prepared = abort_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -225,6 +237,19 @@ StartupDecodingContext(List *output_plugin_options,
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);
+ /*
+ * To support two phase logical decoding, we require prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however enable two phase logical
+ * decoding when at least one of the methods is enabled so that we can easily identify
+ * missing methods.
+ *
+ * We decide it here, but only check it later in the wrappers.
+ */
+ ctx->enable_twophase = (ctx->callbacks.prepare_cb != NULL) ||
+ (ctx->callbacks.commit_prepared_cb != NULL) ||
+ (ctx->callbacks.abort_prepared_cb != NULL) ||
+ (ctx->callbacks.filter_prepare_cb != NULL);
+
/*
* streaming callbacks
*
@@ -782,6 +807,111 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then prepare callback is mandatory */
+ if (ctx->enable_twophase && ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then commit prepared callback is mandatory */
+ if (ctx->enable_twophase && ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then abort prepared callback is mandatory */
+ if (ctx->enable_twophase && ctx->callbacks.abort_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register abort_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -858,6 +988,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of twophase at PREPARE time is not enabled. In that
+ * case all twophase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->enable_twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 1975d62..d96be77 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
static void ReorderBufferCleanupSerializedTXNs(const char *slotname);
static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
TransactionId xid, XLogSegNo segno);
@@ -413,6 +414,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
if (txn->tuplecid_hash != NULL)
{
@@ -1401,6 +1407,59 @@ ReorderBufferIterTXNFinish(ReorderBuffer *rb,
}
/*
+ * Cleanup the leftover contents of a transaction, usually after the transaction
+ * has been COMMIT PREPARED or ROLLBACK PREPARED. This does the rest of the cleanup
+ * that was not done when the transaction was PREPARED
+ */
+static void
+ReorderBufferCleanupPreparedTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ bool found;
+
+ /*
+ * Cleanup the base snapshot, if set.
+ */
+ if (txn->base_snapshot != NULL)
+ {
+ SnapBuildSnapDecRefcount(txn->base_snapshot);
+ dlist_delete(&txn->base_snapshot_node);
+ }
+
+ /*
+ * Cleanup the snapshot for the last streamed run.
+ */
+ if (txn->snapshot_now != NULL)
+ {
+ Assert(rbtxn_is_streamed(txn));
+ ReorderBufferFreeSnap(rb, txn->snapshot_now);
+ }
+
+ /*
+ * Remove TXN from its containing list.
+ *
+ * Note: if txn is known as subxact, we are deleting the TXN from its
+ * parent's list of known subxacts; this leaves the parent's nsubxacts
+ * count too high, but we don't care. Otherwise, we are deleting the TXN
+ * from the LSN-ordered list of toplevel TXNs.
+ */
+ dlist_delete(&txn->node);
+
+ /* now remove reference from buffer */
+ hash_search(rb->by_txn,
+ (void *) &txn->xid,
+ HASH_REMOVE,
+ &found);
+ Assert(found);
+
+ /* remove entries spilled to disk */
+ if (rbtxn_is_serialized(txn))
+ ReorderBufferRestoreCleanup(rb, txn);
+
+ /* deallocate */
+ ReorderBufferReturnTXN(rb, txn);
+}
+
+/*
* Cleanup the contents of a transaction, usually after the transaction
* committed or aborted.
*/
@@ -1502,12 +1561,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either after streaming or
+ * after a PREPARE.
+ * The flag txn_prepared indicates if this is called after a PREPARE.
+ * If streaming, keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots.If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared)
{
dlist_mutable_iter iter;
@@ -1526,7 +1587,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
/* cleanup changes in the toplevel txn */
@@ -1560,9 +1621,30 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
+ {
+ /*
+ * If this is a prepared txn, cleanup the tuplecids we stored for decoding
+ * catalog snapshot access.
+ * They are always stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);
+ Assert(change->action == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ ReorderBufferReturnChange(rb, change, true);
+ }
+ }
+
/*
* Destroy the (relfilenode, ctid) hashtable, so that we don't leak any
* memory. We could also keep the hash table and update it with new ctid
@@ -1880,7 +1962,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
ReorderBufferChange *specinsert)
{
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Free all resources allocated for toast reconstruction */
ReorderBufferToastReset(rb, txn);
@@ -1987,7 +2069,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2249,7 +2331,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
break;
}
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
@@ -2278,7 +2359,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2319,11 +2409,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
+ {
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
else
ReorderBufferCleanupTXN(rb, txn);
}
@@ -2352,17 +2448,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
- * This error can only occur when we are sending the data in
- * streaming mode and the streaming is not finished yet.
+ * This error can only occur either when we are sending the data in
+ * streaming mode and the streaming is not finished yet or when we are
+ * sending the data out on a PREPARE during a twoi phase commit.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2370,10 +2467,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream remaining data. */
+ if (streaming)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
}
else
{
@@ -2395,23 +2501,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2453,6 +2552,140 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a twophase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ABORT PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->abort_prepared(rb, txn, commit_lsn);
+ }
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+ ReorderBufferCleanupPreparedTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2495,7 +2728,13 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
+
ReorderBufferCleanupTXN(rb, txn);
}
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 45abc44..ee63e7b 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -84,6 +84,11 @@ typedef struct LogicalDecodingContext
*/
bool streaming;
+ /*
+ * Does the output plugin support two phase decoding, and is it enabled?
+ */
+ bool enable_twophase;
+
/*
* State for writing output.
*/
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..96e269b 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -77,6 +77,39 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/abort_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -171,6 +204,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1ae17d5..4d4e35d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -162,9 +163,13 @@ typedef struct ReorderBufferChange
#define RBTXN_HAS_CATALOG_CHANGES 0x0001
#define RBTXN_IS_SUBXACT 0x0002
#define RBTXN_IS_SERIALIZED 0x0004
-#define RBTXN_IS_STREAMED 0x0008
-#define RBTXN_HAS_TOAST_INSERT 0x0010
-#define RBTXN_HAS_SPEC_INSERT 0x0020
+#define RBTXN_PREPARE 0x0008
+#define RBTXN_COMMIT_PREPARED 0x0010
+#define RBTXN_ROLLBACK_PREPARED 0x0020
+#define RBTXN_COMMIT 0x0040
+#define RBTXN_IS_STREAMED 0x0080
+#define RBTXN_HAS_TOAST_INSERT 0x0100
+#define RBTXN_HAS_SPEC_INSERT 0x0200
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -218,6 +223,15 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+/* was this txn committed in the meanwhile? */
+#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -229,6 +243,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -390,6 +407,39 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+
+
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -482,6 +532,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferAbortPreparedCB abort_prepared;
ReorderBufferMessageCB message;
/*
@@ -548,6 +603,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -571,6 +631,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
v5-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchapplication/octet-stream; name=v5-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchDownload
From e8fc641a1a98c611efd469e867bdd04a1947d909 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 22 Sep 2020 06:35:11 -0400
Subject: [PATCH v5] Tap test to test concurrent aborts during 2 phase commits
This test is specifically for testing concurrent abort while logical decode
is ongoing. Pass in the xid of the 2PC to the plugin as an option.
On the receipt of a valid "check-xid", the change API in the test decoding
plugin will wait for it to be aborted.
---
contrib/test_decoding/Makefile | 2 +
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++++++++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index f23f15b..4905a0a 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -9,6 +9,8 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..d8c2c29
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
--
1.8.3.1
v5-0003-pgoutput-output-plugin-support-for-logical-decodi.patchapplication/octet-stream; name=v5-0003-pgoutput-output-plugin-support-for-logical-decodi.patchDownload
From fbcc9bbc36d002a9e20edd76a5eac0eca0434ed0 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 22 Sep 2020 07:12:45 -0400
Subject: [PATCH v5] pgoutput output plugin support for logical decoding of 2pc
---
src/backend/access/transam/twophase.c | 31 ++++++
src/backend/replication/logical/proto.c | 90 ++++++++++++++-
src/backend/replication/logical/worker.c | 147 ++++++++++++++++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 54 ++++++++-
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 37 ++++++-
src/test/subscription/t/020_twophase.pl | 163 ++++++++++++++++++++++++++++
7 files changed, 514 insertions(+), 9 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index ef4f998..bed87d5 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,37 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
+
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (!gxact->valid)
+ continue;
+ if (strcmp(gxact->gid, gid) != 0)
+ continue;
+
+ LWLockRelease(TwoPhaseStateLock);
+
+ return true;
+ }
+
+ LWLockRelease(TwoPhaseStateLock);
+
+ return false;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..291ed10 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -72,12 +72,17 @@ logicalrep_read_begin(StringInfo in, LogicalRepBeginData *begin_data)
*/
void
logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, bool is_commit)
{
uint8 flags = 0;
pq_sendbyte(out, 'C'); /* sending COMMIT */
+ if (is_commit)
+ flags |= LOGICALREP_IS_COMMIT;
+ else
+ flags |= LOGICALREP_IS_ABORT;
+
/* send the flags field (unused for now) */
pq_sendbyte(out, flags);
@@ -88,16 +93,20 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
}
/*
- * Read transaction COMMIT from the stream.
+ * Read transaction COMMIT|ABORT from the stream.
*/
void
logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
{
- /* read flags (unused for now) */
+ /* read flags */
uint8 flags = pq_getmsgbyte(in);
- if (flags != 0)
- elog(ERROR, "unrecognized flags %u in commit message", flags);
+ if (!CommitFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in commit|abort message",
+ flags);
+
+ /* the flag is either commit or abort */
+ commit_data->is_commit = (flags == LOGICALREP_IS_COMMIT);
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
@@ -106,6 +115,77 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for 2PC transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(strlen(txn->gid) > 0);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags |= LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags |= LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index d239d28..62c571e 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -729,7 +729,11 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_lsn = commit_data.end_lsn;
replorigin_session_origin_timestamp = commit_data.committime;
- CommitTransactionCommand();
+ if (commit_data.is_commit)
+ CommitTransactionCommand();
+ else
+ AbortCurrentTransaction();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -749,6 +753,141 @@ apply_handle_commit(StringInfo s)
pgstat_report_activity(STATE_IDLE, NULL);
}
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
/*
* Handle ORIGIN message.
*
@@ -1909,10 +2048,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index eb1f230..729b655 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -143,6 +149,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->abort_prepared_cb = pgoutput_abort_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -373,7 +383,49 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginUpdateProgress(ctx);
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit(ctx->out, txn, commit_lsn, true);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);
}
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 607a728..fb07580 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -85,20 +85,55 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
+ bool is_commit;
XLogRecPtr commit_lsn;
XLogRecPtr end_lsn;
TimestampTz committime;
} LogicalRepCommitData;
+/* types of the commit protocol message */
+#define LOGICALREP_IS_COMMIT 0x01
+#define LOGICALREP_IS_ABORT 0x02
+
+/* commit message is COMMIT or ABORT, and there is nothing else */
+#define CommitFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_COMMIT) || (flags == LOGICALREP_IS_ABORT))
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ABORT] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_PREPARE) || \
+ (flags == LOGICALREP_IS_COMMIT_PREPARED) || \
+ (flags == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn, bool is_commit);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..c7f373d
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,163 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+ ));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf(
+ 'postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+"ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2"
+);
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (11);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 11;");
+ is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is committed on subscriber');
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (12);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 12;");
+ is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is aborted on subscriber');
+
+# Check that commit prepared is decoded properly on crash restart
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a IN (11,12);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
On Tue, Sep 22, 2020 at 5:18 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Sun, Sep 20, 2020 at 3:31 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
3.
+ /* + * If it's ROLLBACK PREPARED then handle it via callbacks. + */ + if (TransactionIdIsValid(xid) && + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && + parsed->dbId == ctx->slot->data.database && + !FilterByOrigin(ctx, origin_id) && + ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid)) + { + ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr, + commit_time, origin_id, origin_lsn, + parsed->twophase_gid, false); + return; + }I think we have already checked !SnapBuildXactNeedsSkip, parsed->dbId
== ctx->slot->data.database and !FilterByOrigin in DecodePrepare
so if those are not true then we wouldn't have prepared this
transaction i.e. ReorderBufferTxnIsPrepared will be false so why do we
need
to recheck this conditions.I didnt change this, as I am seeing cases where the Abort is getting
called for transactions that needs to be skipped. I also see that the
same check is there both in DecodePrepare and DecodeCommit.
So, while the same transactions were not getting prepared or
committed, it tries to get ROLLBACK PREPARED (as part of finish
prepared handling). The check in if ReorderBufferTxnIsPrepared() is
also not proper.
If the transaction is prepared which you can ensure via
ReorderBufferTxnIsPrepared() (considering you have a proper check in
that function), it should not require skipping the transaction in
Abort. One way it could happen is if you clean up the ReorderBufferTxn
in Prepare which you were doing in earlier version of patch which I
pointed out was wrong, if you have changed that then I don't know why
it could fail, may be someplace else during prepare the patch is
freeing it. Just check that.
I will need to relook
this logic again in a future patch.
No problem. I think you can handle the other comments and then we can
come back to this and you might want to share the exact details of the
test (may be a narrow down version of the original test) and I or
someone else might be able to help you with that.
--
With Regards,
Amit Kapila.
On Wed, Sep 23, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
No problem. I think you can handle the other comments and then we can
come back to this and you might want to share the exact details of the
test (may be a narrow down version of the original test) and I or
someone else might be able to help you with that.--
With Regards,
Amit Kapila.
I have added a new patch for supporting 2 phase commit semantics in
the streaming APIs for the logical decoding plugins. I have added 3
APIs
1. stream_prepare
2. stream_commit_prepared
3. stream_abort_prepared
I have also added the support for the new APIs in test_decoding
plugin. I have not yet added it to pgoutpout.
I have also added a fix for the error I saw while calling
ReorderBufferCleanupTXN as part of FinishPrepared handling. As a
result I have removed the function I added earlier,
ReorderBufferCleanupPreparedTXN.
Please have a look at the new changes and let me know what you think.
I will continue to look at:
1. Remove snapshots on prepare truncate.
2. Bug seen while abort of prepared transaction, the prepared flag is
lost, and not able to make out that it was a previously prepared
transaction.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v6-0001-Support-decoding-of-two-phase-transactions.patchapplication/octet-stream; name=v6-0001-Support-decoding-of-two-phase-transactions.patchDownload
From 13352a817819b01e9089bfcb1eee1d62545c4c4c Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Mon, 28 Sep 2020 02:38:44 -0400
Subject: [PATCH v6] Support decoding of two-phase transactions
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes documentation changes.
---
contrib/test_decoding/expected/prepared.out | 187 ++++++++++++--
contrib/test_decoding/sql/prepared.sql | 79 +++++-
contrib/test_decoding/test_decoding.c | 166 +++++++++++++
doc/src/sgml/logicaldecoding.sgml | 110 ++++++++-
src/backend/replication/logical/decode.c | 129 +++++++++-
src/backend/replication/logical/logical.c | 175 +++++++++++++
src/backend/replication/logical/reorderbuffer.c | 312 +++++++++++++++++++++---
src/include/replication/logical.h | 5 +
src/include/replication/output_plugin.h | 37 +++
src/include/replication/reorderbuffer.h | 75 +++++-
10 files changed, 1188 insertions(+), 87 deletions(-)
diff --git a/contrib/test_decoding/expected/prepared.out b/contrib/test_decoding/expected/prepared.out
index 46e915d..fd0e8a4 100644
--- a/contrib/test_decoding/expected/prepared.out
+++ b/contrib/test_decoding/expected/prepared.out
@@ -6,19 +6,50 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
init
(1 row)
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ PREPARE TRANSACTION 'test_prepared#1'
+(3 rows)
+
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:2
+ COMMIT
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(6 rows)
+
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
INSERT INTO test_prepared1 VALUES (4);
-- test prepared xact containing ddl
BEGIN;
@@ -26,45 +57,149 @@ INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
-INSERT INTO test_prepared2 VALUES (7);
-COMMIT PREPARED 'test_prepared#3';
--- make sure stuff still works
-INSERT INTO test_prepared1 VALUES (8);
-INSERT INTO test_prepared2 VALUES (9);
--- cleanup
-DROP TABLE test_prepared1;
-DROP TABLE test_prepared2;
--- show results
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
data
-------------------------------------------------------------------------
BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- COMMIT
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:2
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:4
COMMIT
BEGIN
- table public.test_prepared2: INSERT: id[integer]:7
- COMMIT
- BEGIN
table public.test_prepared1: INSERT: id[integer]:5
table public.test_prepared1: INSERT: id[integer]:6 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(7 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (8);
+INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:8 data[text]:null
COMMIT
BEGIN
table public.test_prepared2: INSERT: id[integer]:9
COMMIT
-(22 rows)
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+ relation | locktype | mode
+----------+----------+------
+(0 rows)
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:10 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:11 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:12
+ PREPARE TRANSACTION 'test_prepared_lock2'
+ COMMIT PREPARED 'test_prepared_lock2'
+(8 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
SELECT pg_drop_replication_slot('regression_slot');
pg_drop_replication_slot
diff --git a/contrib/test_decoding/sql/prepared.sql b/contrib/test_decoding/sql/prepared.sql
index e726397..162fe43 100644
--- a/contrib/test_decoding/sql/prepared.sql
+++ b/contrib/test_decoding/sql/prepared.sql
@@ -2,21 +2,25 @@
SET synchronous_commit = on;
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
-- test simple successful use of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (1);
PREPARE TRANSACTION 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (2);
-- test abort of a prepared xact
BEGIN;
INSERT INTO test_prepared1 VALUES (3);
PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
INSERT INTO test_prepared1 VALUES (4);
@@ -27,24 +31,83 @@ ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
--- test that we decode correctly while an uncommitted prepared xact
--- with ddl exists.
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
--- separate table because of the lock from the ALTER
--- this will come before the '5' row above, as this commits before it.
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+-- We should see '7' before '5' in our results since it commits first.
+--
INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
INSERT INTO test_prepared2 VALUES (9);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (10, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (11, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+BEGIN;
+insert into test_prepared2 values (12);
+PREPARE TRANSACTION 'test_prepared_lock2';
+COMMIT PREPARED 'test_prepared_lock2';
+
+SELECT 'pg_class' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'pg_class'::regclass;
+
+-- Shouldn't timeout on 2pc decoding.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- will work normally after we commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints
+BEGIN;
+SAVEPOINT test_savepoint;
+CREATE TABLE test_prepared_savepoint (a int);
+PREPARE TRANSACTION 'test_prepared_savepoint';
+COMMIT PREPARED 'test_prepared_savepoint';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-- cleanup
DROP TABLE test_prepared1;
DROP TABLE test_prepared2;
--- show results
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e60ab34..1bb17a6 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -88,6 +93,19 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
void
_PG_init(void)
@@ -116,6 +134,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
+
}
@@ -127,6 +150,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
ListCell *option;
TestDecodingData *data;
bool enable_streaming = false;
+ bool enable_2pc = false;
data = palloc0(sizeof(TestDecodingData));
data->context = AllocSetContextCreate(ctx->context,
@@ -136,6 +160,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid = InvalidTransactionId;
ctx->output_plugin_private = data;
@@ -227,6 +252,42 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
+ else if (!parse_bool(strVal(elem->arg), &enable_2pc))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse value \"%s\" for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+ }
+ else if (strcmp(elem->defname, "check-xid") == 0)
+ {
+ if (elem->arg)
+ {
+ errno = 0;
+ data->check_xid = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (errno == EINVAL || errno == ERANGE)
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid is not a valid number: \"%s\"",
+ strVal(elem->arg))));
+ }
+ else
+ ereport(FATAL,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid needs an input value")));
+
+ if (data->check_xid <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("Specify positive value for parameter \"%s\","
+ " you specified \"%s\"",
+ elem->defname, strVal(elem->arg))));
+ }
else
{
ereport(ERROR,
@@ -238,6 +299,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->enable_twophase &= enable_2pc;
}
/* cleanup this plugin's resources */
@@ -297,6 +359,94 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +605,22 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid is specified */
+ if (TransactionIdIsValid(data->check_xid))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid);
+ while (TransactionIdIsInProgress(data->check_xid))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid) &&
+ !TransactionIdDidCommit(data->check_xid))
+ elog(LOG, "%u aborted", data->check_xid);
+
+ Assert(TransactionIdDidAbort(data->check_xid));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..bd4542e 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -387,6 +387,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
@@ -477,7 +481,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +588,55 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Commit Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+ <title>Rollback Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>abort_prepared_cb</function> callback is called whenever
+ a rollback prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +646,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +729,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +783,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d..b37b62d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -70,6 +70,9 @@ static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -312,17 +315,34 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->enable_twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -647,9 +667,69 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
buf->origptr, buf->endptr);
}
+ /*
+ * Decide if we're processing COMMIT PREPARED, or a regular COMMIT.
+ * Regular commit simply triggers a replay of transaction changes from the
+ * reorder buffer. For COMMIT PREPARED that however already happened at
+ * PREPARE time, and so we only need to notify the subscriber that the GID
+ * finally committed.
+ *
+ * For output plugins that do not support PREPARE-time decoding of
+ * two-phase transactions, we never even see the PREPARE and all two-phase
+ * transactions simply fall through to the second branch.
+ */
+ if (TransactionIdIsValid(parsed->twophase_xid) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder,
+ parsed->twophase_xid, parsed->twophase_gid))
+ {
+ Assert(xid == parsed->twophase_xid);
+ /* we are processing COMMIT PREPARED */
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+ else
+ {
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
/* replay actions of all transaction + subtransactions in order */
- ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
- commit_time, origin_id, origin_lsn);
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
}
/*
@@ -661,6 +741,31 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid)
{
int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
+ ReorderBufferTxnIsPrepared(ctx->reorder, xid, parsed->twophase_gid))
+ {
+
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
for (i = 0; i < parsed->nsubxacts; i++)
{
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 0f6af95..4e95337 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -58,6 +58,14 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -206,6 +214,10 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->abort_prepared = abort_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -225,6 +237,19 @@ StartupDecodingContext(List *output_plugin_options,
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);
+ /*
+ * To support two phase logical decoding, we require prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however enable two phase logical
+ * decoding when at least one of the methods is enabled so that we can easily identify
+ * missing methods.
+ *
+ * We decide it here, but only check it later in the wrappers.
+ */
+ ctx->enable_twophase = (ctx->callbacks.prepare_cb != NULL) ||
+ (ctx->callbacks.commit_prepared_cb != NULL) ||
+ (ctx->callbacks.abort_prepared_cb != NULL) ||
+ (ctx->callbacks.filter_prepare_cb != NULL);
+
/*
* streaming callbacks
*
@@ -782,6 +807,111 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then prepare callback is mandatory */
+ if (ctx->enable_twophase && ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then commit prepared callback is mandatory */
+ if (ctx->enable_twophase && ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "abort_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then abort prepared callback is mandatory */
+ if (ctx->enable_twophase && ctx->callbacks.abort_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register abort_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.abort_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -858,6 +988,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of twophase at PREPARE time is not enabled. In that
+ * case all twophase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->enable_twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 1975d62..5ff920b 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
static void ReorderBufferCleanupSerializedTXNs(const char *slotname);
static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
TransactionId xid, XLogSegNo segno);
@@ -413,6 +414,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
if (txn->tuplecid_hash != NULL)
{
@@ -1401,6 +1407,59 @@ ReorderBufferIterTXNFinish(ReorderBuffer *rb,
}
/*
+ * Cleanup the leftover contents of a transaction, usually after the transaction
+ * has been COMMIT PREPARED or ROLLBACK PREPARED. This does the rest of the cleanup
+ * that was not done when the transaction was PREPARED
+ */
+static void
+ReorderBufferCleanupPreparedTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ bool found;
+
+ /*
+ * Cleanup the base snapshot, if set.
+ */
+ if (txn->base_snapshot != NULL)
+ {
+ SnapBuildSnapDecRefcount(txn->base_snapshot);
+ dlist_delete(&txn->base_snapshot_node);
+ }
+
+ /*
+ * Cleanup the snapshot for the last streamed run.
+ */
+ if (txn->snapshot_now != NULL)
+ {
+ Assert(rbtxn_is_streamed(txn));
+ ReorderBufferFreeSnap(rb, txn->snapshot_now);
+ }
+
+ /*
+ * Remove TXN from its containing list.
+ *
+ * Note: if txn is known as subxact, we are deleting the TXN from its
+ * parent's list of known subxacts; this leaves the parent's nsubxacts
+ * count too high, but we don't care. Otherwise, we are deleting the TXN
+ * from the LSN-ordered list of toplevel TXNs.
+ */
+ dlist_delete(&txn->node);
+
+ /* now remove reference from buffer */
+ hash_search(rb->by_txn,
+ (void *) &txn->xid,
+ HASH_REMOVE,
+ &found);
+ Assert(found);
+
+ /* remove entries spilled to disk */
+ if (rbtxn_is_serialized(txn))
+ ReorderBufferRestoreCleanup(rb, txn);
+
+ /* deallocate */
+ ReorderBufferReturnTXN(rb, txn);
+}
+
+/*
* Cleanup the contents of a transaction, usually after the transaction
* committed or aborted.
*/
@@ -1502,12 +1561,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either after streaming or
+ * after a PREPARE.
+ * The flag txn_prepared indicates if this is called after a PREPARE.
+ * If streaming, keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots.If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared)
{
dlist_mutable_iter iter;
@@ -1526,7 +1587,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
/* cleanup changes in the toplevel txn */
@@ -1560,9 +1621,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
+ {
+ /*
+ * If this is a prepared txn, cleanup the tuplecids we stored for decoding
+ * catalog snapshot access.
+ * They are always stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);
+ Assert(change->action == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* remove the change from it's containing list */
+ dlist_delete(&change->node);
+
+ ReorderBufferReturnChange(rb, change, true);
+ }
+ }
+
/*
* Destroy the (relfilenode, ctid) hashtable, so that we don't leak any
* memory. We could also keep the hash table and update it with new ctid
@@ -1880,7 +1965,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
ReorderBufferChange *specinsert)
{
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Free all resources allocated for toast reconstruction */
ReorderBufferToastReset(rb, txn);
@@ -1987,7 +2072,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
break;
}
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
@@ -2278,7 +2362,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2319,11 +2412,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
+ {
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
else
ReorderBufferCleanupTXN(rb, txn);
}
@@ -2352,17 +2451,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
- * This error can only occur when we are sending the data in
- * streaming mode and the streaming is not finished yet.
+ * This error can only occur either when we are sending the data in
+ * streaming mode and the streaming is not finished yet or when we are
+ * sending the data out on a PREPARE during a twoi phase commit.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2370,10 +2470,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream remaining data. */
+ if (streaming)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
}
else
{
@@ -2395,23 +2504,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2453,6 +2555,140 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a twophase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ABORT PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->abort_prepared(rb, txn, commit_lsn);
+ }
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2495,7 +2731,13 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
+
ReorderBufferCleanupTXN(rb, txn);
}
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 45abc44..ee63e7b 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -84,6 +84,11 @@ typedef struct LogicalDecodingContext
*/
bool streaming;
+ /*
+ * Does the output plugin support two phase decoding, and is it enabled?
+ */
+ bool enable_twophase;
+
/*
* State for writing output.
*/
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..96e269b 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -77,6 +77,39 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/abort_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -171,6 +204,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeAbortPreparedCB abort_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1ae17d5..4d4e35d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -162,9 +163,13 @@ typedef struct ReorderBufferChange
#define RBTXN_HAS_CATALOG_CHANGES 0x0001
#define RBTXN_IS_SUBXACT 0x0002
#define RBTXN_IS_SERIALIZED 0x0004
-#define RBTXN_IS_STREAMED 0x0008
-#define RBTXN_HAS_TOAST_INSERT 0x0010
-#define RBTXN_HAS_SPEC_INSERT 0x0020
+#define RBTXN_PREPARE 0x0008
+#define RBTXN_COMMIT_PREPARED 0x0010
+#define RBTXN_ROLLBACK_PREPARED 0x0020
+#define RBTXN_COMMIT 0x0040
+#define RBTXN_IS_STREAMED 0x0080
+#define RBTXN_HAS_TOAST_INSERT 0x0100
+#define RBTXN_HAS_SPEC_INSERT 0x0200
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -218,6 +223,15 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+/* was this txn committed in the meanwhile? */
+#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -229,6 +243,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -390,6 +407,39 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+
+
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -482,6 +532,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferAbortPreparedCB abort_prepared;
ReorderBufferMessageCB message;
/*
@@ -548,6 +603,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -571,6 +631,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
v6-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchapplication/octet-stream; name=v6-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchDownload
From 5fe74742178c021384591b4898174698df80fd06 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Mon, 28 Sep 2020 02:41:25 -0400
Subject: [PATCH v6] Tap test to test concurrent aborts during 2 phase commits
This test is specifically for testing concurrent abort while logical decode
is ongoing. Pass in the xid of the 2PC to the plugin as an option.
On the receipt of a valid "check-xid", the change API in the test decoding
plugin will wait for it to be aborted.
---
contrib/test_decoding/Makefile | 2 +
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++++++++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index f23f15b..4905a0a 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -9,6 +9,8 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..d8c2c29
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
--
1.8.3.1
v6-0004-Support-two-phase-commits-in-streaming-mode-in-lo.patchapplication/octet-stream; name=v6-0004-Support-two-phase-commits-in-streaming-mode-in-lo.patchDownload
From 3fb6ae7c4aeae53fae8bf9304878f144e39cb6cf Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Mon, 28 Sep 2020 03:08:30 -0400
Subject: [PATCH v6] Support two phase commits in streaming mode in logical
decoding
Add APIs to the streaming APIS for PREPARE, COMMIT PREPARED and ROLLBACK PREPARED
---
contrib/test_decoding/test_decoding.c | 84 ++++++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 56 +++++++++++-
src/backend/replication/logical/logical.c | 111 ++++++++++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 41 +++++++--
src/include/replication/output_plugin.h | 30 +++++++
src/include/replication/reorderbuffer.h | 21 +++++
6 files changed, 331 insertions(+), 12 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 1bb17a6..bb9f787 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -78,6 +78,15 @@ static void pg_decode_stream_stop(LogicalDecodingContext *ctx,
static void pg_decode_stream_abort(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_stream_commit_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_stream_abort_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
static void pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
@@ -130,6 +139,9 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_start_cb = pg_decode_stream_start;
cb->stream_stop_cb = pg_decode_stream_stop;
cb->stream_abort_cb = pg_decode_stream_abort;
+ cb->stream_prepare_cb = pg_decode_stream_prepare;
+ cb->stream_commit_prepared_cb = pg_decode_stream_commit_prepared;
+ cb->stream_abort_prepared_cb = pg_decode_stream_abort_prepared;
cb->stream_commit_cb = pg_decode_stream_commit;
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
@@ -812,6 +824,78 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}
static void
+pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "preparing streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "preparing streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
+pg_decode_stream_commit_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "commit prepared streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "commit prepared streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
+pg_decode_stream_abort_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "abort prepared streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "abort prepared streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index bd4542e..a8be9bf 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -396,6 +396,9 @@ typedef struct OutputPluginCallbacks
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamAbortPreparedCB stream_abort_prepared_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
@@ -418,7 +421,9 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb);
in-progress transactions. The <function>stream_start_cb</function>,
<function>stream_stop_cb</function>, <function>stream_abort_cb</function>,
<function>stream_commit_cb</function> and <function>stream_change_cb</function>
- are required, while <function>stream_message_cb</function> and
+ are required, while <function>stream_message_cb</function>,
+ <function>stream_prepare_cb</function>, <function>stream_commit_prepared_cb</function>,
+ <function>stream_abort_prepared_cb</function>,
<function>stream_truncate_cb</function> are optional.
</para>
</sect2>
@@ -839,6 +844,45 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-stream-prepare">
+ <title>Stream Prepare Callback</title>
+ <para>
+ The <function>stream_prepare_cb</function> callback is called to prepare
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-commit-prepared">
+ <title>Stream Commit Prepared Callback</title>
+ <para>
+ The <function>stream_commit_prepared_cb</function> callback is called to commit prepared
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-abort-prepared">
+ <title>Stream Abort Prepared Callback</title>
+ <para>
+ The <function>stream_abort_prepared_cb</function> callback is called to abort prepared
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-stream-commit">
<title>Stream Commit Callback</title>
<para>
@@ -1017,9 +1061,13 @@ OutputPluginWrite(ctx, true);
When streaming an in-progress transaction, the changes (and messages) are
streamed in blocks demarcated by <function>stream_start_cb</function>
and <function>stream_stop_cb</function> callbacks. Once all the decoded
- changes are transmitted, the transaction is committed using the
- <function>stream_commit_cb</function> callback (or possibly aborted using
- the <function>stream_abort_cb</function> callback).
+ changes are transmitted, the transaction can be committed using the
+ the <function>stream_commit_cb</function> callback
+ (or possibly aborted using the <function>stream_abort_cb</function> callback).
+ If two phase commits are supported, the transaction can be prepared using the
+ <function>stream_prepare_cb</function> callback, commit prepared using the
+ <function>stream_commit_prepared_cb</function> callback or aborted using the
+ <function>stream_abort_prepared_cb</function>
</para>
<para>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 4e95337..47968cb 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -81,6 +81,12 @@ static void stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr last_lsn);
static void stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
static void stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
static void stream_change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -232,6 +238,9 @@ StartupDecodingContext(List *output_plugin_options,
ctx->streaming = (ctx->callbacks.stream_start_cb != NULL) ||
(ctx->callbacks.stream_stop_cb != NULL) ||
(ctx->callbacks.stream_abort_cb != NULL) ||
+ (ctx->callbacks.stream_prepare_cb != NULL) ||
+ (ctx->callbacks.stream_commit_prepared_cb != NULL) ||
+ (ctx->callbacks.stream_abort_prepared_cb != NULL) ||
(ctx->callbacks.stream_commit_cb != NULL) ||
(ctx->callbacks.stream_change_cb != NULL) ||
(ctx->callbacks.stream_message_cb != NULL) ||
@@ -261,6 +270,9 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->stream_start = stream_start_cb_wrapper;
ctx->reorder->stream_stop = stream_stop_cb_wrapper;
ctx->reorder->stream_abort = stream_abort_cb_wrapper;
+ ctx->reorder->stream_prepare = stream_prepare_cb_wrapper;
+ ctx->reorder->stream_commit_prepared = stream_commit_prepared_cb_wrapper;
+ ctx->reorder->stream_abort_prepared = stream_abort_prepared_cb_wrapper;
ctx->reorder->stream_commit = stream_commit_cb_wrapper;
ctx->reorder->stream_change = stream_change_cb_wrapper;
ctx->reorder->stream_message = stream_message_cb_wrapper;
@@ -1231,6 +1243,105 @@ stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_prepare";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ ctx->callbacks.stream_prepare_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_commit_prepared";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ ctx->callbacks.stream_commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+stream_abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_abort_prepared";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ ctx->callbacks.stream_abort_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5ff920b..e124c35 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1834,9 +1834,18 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn)
ReorderBufferStreamTXN(rb, txn);
- rb->stream_commit(rb, txn, txn->final_lsn);
-
- ReorderBufferCleanupTXN(rb, txn);
+ if (rbtxn_prepared(txn))
+ {
+ rb->stream_prepare(rb, txn, txn->final_lsn);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
+ else
+ {
+ rb->stream_commit(rb, txn, txn->final_lsn);
+ ReorderBufferCleanupTXN(rb, txn);
+ }
}
/*
@@ -2672,15 +2681,31 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
strcpy(txn->gid, gid);
- if (is_commit)
+ if (rbtxn_is_streamed(txn))
{
- txn->txn_flags |= RBTXN_COMMIT_PREPARED;
- rb->commit_prepared(rb, txn, commit_lsn);
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->stream_commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->stream_abort_prepared(rb, txn, commit_lsn);
+ }
}
else
{
- txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
- rb->abort_prepared(rb, txn, commit_lsn);
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->abort_prepared(rb, txn, commit_lsn);
+ }
}
/* cleanup: make sure there's no cache pollution */
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 96e269b..6000096 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -157,6 +157,33 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);
/*
+ * Called to prepare changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit and only when
+ * two-phased commits are supported
+ */
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called to commit prepared changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit and only when
+ * two-phased commits are supported
+ */
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called to abort/rollback prepared changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit and only when
+ * two-phased commits are supported
+ */
+typedef void (*LogicalDecodeStreamAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
* Called to apply changes streamed to remote node from in-progress
* transaction.
*/
@@ -214,6 +241,9 @@ typedef struct OutputPluginCallbacks
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamAbortPreparedCB stream_abort_prepared_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 4d4e35d..a4dc509 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -466,6 +466,24 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
/* commit streamed transaction callback signature */
typedef void (*ReorderBufferStreamCommitCB) (
ReorderBuffer *rb,
@@ -545,6 +563,9 @@ struct ReorderBuffer
ReorderBufferStreamStartCB stream_start;
ReorderBufferStreamStopCB stream_stop;
ReorderBufferStreamAbortCB stream_abort;
+ ReorderBufferStreamPrepareCB stream_prepare;
+ ReorderBufferStreamCommitPreparedCB stream_commit_prepared;
+ ReorderBufferStreamAbortPreparedCB stream_abort_prepared;
ReorderBufferStreamCommitCB stream_commit;
ReorderBufferStreamChangeCB stream_change;
ReorderBufferStreamMessageCB stream_message;
--
1.8.3.1
v6-0003-pgoutput-output-plugin-support-for-logical-decodi.patchapplication/octet-stream; name=v6-0003-pgoutput-output-plugin-support-for-logical-decodi.patchDownload
From 1f98e665c2bb04b621cbc16b386e4c1544aa180f Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Mon, 28 Sep 2020 02:48:53 -0400
Subject: [PATCH v6] pgoutput output plugin support for logical decoding of 2pc
Support decoding of two phase commit in pgoutput and on subscriber side.
---
src/backend/access/transam/twophase.c | 31 ++++++
src/backend/replication/logical/proto.c | 90 ++++++++++++++-
src/backend/replication/logical/worker.c | 147 ++++++++++++++++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 54 ++++++++-
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 37 ++++++-
src/test/subscription/t/020_twophase.pl | 163 ++++++++++++++++++++++++++++
7 files changed, 514 insertions(+), 9 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060..f470210 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,37 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
+
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (!gxact->valid)
+ continue;
+ if (strcmp(gxact->gid, gid) != 0)
+ continue;
+
+ LWLockRelease(TwoPhaseStateLock);
+
+ return true;
+ }
+
+ LWLockRelease(TwoPhaseStateLock);
+
+ return false;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..291ed10 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -72,12 +72,17 @@ logicalrep_read_begin(StringInfo in, LogicalRepBeginData *begin_data)
*/
void
logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, bool is_commit)
{
uint8 flags = 0;
pq_sendbyte(out, 'C'); /* sending COMMIT */
+ if (is_commit)
+ flags |= LOGICALREP_IS_COMMIT;
+ else
+ flags |= LOGICALREP_IS_ABORT;
+
/* send the flags field (unused for now) */
pq_sendbyte(out, flags);
@@ -88,16 +93,20 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
}
/*
- * Read transaction COMMIT from the stream.
+ * Read transaction COMMIT|ABORT from the stream.
*/
void
logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
{
- /* read flags (unused for now) */
+ /* read flags */
uint8 flags = pq_getmsgbyte(in);
- if (flags != 0)
- elog(ERROR, "unrecognized flags %u in commit message", flags);
+ if (!CommitFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in commit|abort message",
+ flags);
+
+ /* the flag is either commit or abort */
+ commit_data->is_commit = (flags == LOGICALREP_IS_COMMIT);
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
@@ -106,6 +115,77 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for 2PC transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(strlen(txn->gid) > 0);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags |= LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags |= LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9c6fdee..a08da85 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -729,7 +729,11 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_lsn = commit_data.end_lsn;
replorigin_session_origin_timestamp = commit_data.committime;
- CommitTransactionCommand();
+ if (commit_data.is_commit)
+ CommitTransactionCommand();
+ else
+ AbortCurrentTransaction();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -749,6 +753,141 @@ apply_handle_commit(StringInfo s)
pgstat_report_activity(STATE_IDLE, NULL);
}
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
/*
* Handle ORIGIN message.
*
@@ -1909,10 +2048,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 9c997ae..174af7f 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -143,6 +149,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->abort_prepared_cb = pgoutput_abort_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -373,7 +383,49 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginUpdateProgress(ctx);
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit(ctx->out, txn, commit_lsn, true);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);
}
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0c2cda2..33d719c 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -87,20 +87,55 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
+ bool is_commit;
XLogRecPtr commit_lsn;
XLogRecPtr end_lsn;
TimestampTz committime;
} LogicalRepCommitData;
+/* types of the commit protocol message */
+#define LOGICALREP_IS_COMMIT 0x01
+#define LOGICALREP_IS_ABORT 0x02
+
+/* commit message is COMMIT or ABORT, and there is nothing else */
+#define CommitFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_COMMIT) || (flags == LOGICALREP_IS_ABORT))
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ABORT] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_PREPARE) || \
+ (flags == LOGICALREP_IS_COMMIT_PREPARED) || \
+ (flags == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn, bool is_commit);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..c7f373d
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,163 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+ ));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf(
+ 'postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+"ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2"
+);
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (11);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 11;");
+ is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is committed on subscriber');
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (12);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 12;");
+ is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is aborted on subscriber');
+
+# Check that commit prepared is decoded properly on crash restart
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a IN (11,12);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
On Mon, Sep 28, 2020 at 1:13 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Wed, Sep 23, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have added a new patch for supporting 2 phase commit semantics in
the streaming APIs for the logical decoding plugins. I have added 3
APIs
1. stream_prepare
2. stream_commit_prepared
3. stream_abort_preparedI have also added the support for the new APIs in test_decoding
plugin. I have not yet added it to pgoutpout.I have also added a fix for the error I saw while calling
ReorderBufferCleanupTXN as part of FinishPrepared handling. As a
result I have removed the function I added earlier,
ReorderBufferCleanupPreparedTXN.
Can you explain what was the problem and how you fixed it?
Please have a look at the new changes and let me know what you think.
I will continue to look at:
1. Remove snapshots on prepare truncate.
2. Bug seen while abort of prepared transaction, the prepared flag is
lost, and not able to make out that it was a previously prepared
transaction.
And the support of new APIs in pgoutput, right?
--
With Regards,
Amit Kapila.
On Mon, Sep 28, 2020 at 6:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Sep 28, 2020 at 1:13 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Wed, Sep 23, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have added a new patch for supporting 2 phase commit semantics in
the streaming APIs for the logical decoding plugins. I have added 3
APIs
1. stream_prepare
2. stream_commit_prepared
3. stream_abort_preparedI have also added the support for the new APIs in test_decoding
plugin. I have not yet added it to pgoutpout.I have also added a fix for the error I saw while calling
ReorderBufferCleanupTXN as part of FinishPrepared handling. As a
result I have removed the function I added earlier,
ReorderBufferCleanupPreparedTXN.Can you explain what was the problem and how you fixed it?
When I added the changes for cleaning up tuplecids in
ReorderBufferTruncateTXN, I was not deleting it from the list
(dlist_delete), only calling ReorderBufferReturnChange to free
memory. This logic was copied from ReorderBufferCleanupTXN, there the
lists were all cleaned up in the end, so was not present in each list
cleanup logic.
Please have a look at the new changes and let me know what you think.
I will continue to look at:
1. Remove snapshots on prepare truncate.
2. Bug seen while abort of prepared transaction, the prepared flag is
lost, and not able to make out that it was a previously prepared
transaction.And the support of new APIs in pgoutput, right?
Yes, that also.
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Sep 23, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
If the transaction is prepared which you can ensure via
ReorderBufferTxnIsPrepared() (considering you have a proper check in
that function), it should not require skipping the transaction in
Abort. One way it could happen is if you clean up the ReorderBufferTxn
in Prepare which you were doing in earlier version of patch which I
pointed out was wrong, if you have changed that then I don't know why
it could fail, may be someplace else during prepare the patch is
freeing it. Just check that.
I had a look at this problem. The problem happens when decoding is
done after a prepare but before the corresponding rollback
prepared/commit prepared.
For eg:
Begin;
<change 1>
<change 2>
PREPARE TRANSACTION '<prepare#1>';
SELECT data FROM pg_logical_slot_get_changes(...);
:
:
ROLLBACK PREPARED '<prepare#1>';
SELECT data FROM pg_logical_slot_get_changes(...);
Since the prepare is consumed in the first call to
pg_logical_slot_get_changes, subsequently when it is encountered in
the second call, it is skipped (as already decoded) in DecodePrepare
and the txn->flags are not set to
reflect the fact that it was prepared. The same behaviour is seen when
it is commit prepared after the original prepare was consumed.
Initially I was thinking about the following approach to fix it in DecodePrepare
Approach 1:
1. Break the big Skip check in DecodePrepare into 2 parts.
Return if the following conditions are true:
If (parsed->dbId != InvalidOid && parsed->dbId !=
ctx->slot->data.database) ||
ctx->fast_forward || FilterByOrigin(ctx, origin_id))
2. Check If this condition is true:
SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr)
Then this means we are skipping because this has already
been decoded, then instead of returning, call a new function
ReorderBufferMarkPrepare() which will only update the flags in the txn
to indicate that the transaction is prepared
Then later in DecodeAbort or DecodeCommit, we can confirm
that the transaction has been Prepared by checking if the flag is set
and call ReorderBufferFinishPrepared appropriately.
But then, thinking about this some more, I thought of a second approach.
Approach 2:
If the only purpose of all this was to differentiate between
Abort vs Rollback Prepared and Commit vs Commit Prepared, then we dont
need this. We already know the exact operation
in DecodeXactOp and can differentiate there. We only
overloaded DecodeAbort and DecodeCommit for convenience, we can always
call these functions with an extra flag to denote that we are either
commit or aborting a
previously prepared transaction and call
ReorderBufferFinishPrepared accordingly.
Let me know your thoughts on these two approaches or any other
suggestions on this.
regards,
Ajin Cherian
Fujitsu Australia
On Tue, Sep 29, 2020 at 5:08 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Wed, Sep 23, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
If the transaction is prepared which you can ensure via
ReorderBufferTxnIsPrepared() (considering you have a proper check in
that function), it should not require skipping the transaction in
Abort. One way it could happen is if you clean up the ReorderBufferTxn
in Prepare which you were doing in earlier version of patch which I
pointed out was wrong, if you have changed that then I don't know why
it could fail, may be someplace else during prepare the patch is
freeing it. Just check that.I had a look at this problem. The problem happens when decoding is
done after a prepare but before the corresponding rollback
prepared/commit prepared.
For eg:Begin;
<change 1>
<change 2>
PREPARE TRANSACTION '<prepare#1>';
SELECT data FROM pg_logical_slot_get_changes(...);
:
:
ROLLBACK PREPARED '<prepare#1>';
SELECT data FROM pg_logical_slot_get_changes(...);Since the prepare is consumed in the first call to
pg_logical_slot_get_changes, subsequently when it is encountered in
the second call, it is skipped (as already decoded) in DecodePrepare
and the txn->flags are not set to
reflect the fact that it was prepared. The same behaviour is seen when
it is commit prepared after the original prepare was consumed.
Initially I was thinking about the following approach to fix it in DecodePrepare
Approach 1:
1. Break the big Skip check in DecodePrepare into 2 parts.
Return if the following conditions are true:
If (parsed->dbId != InvalidOid && parsed->dbId !=
ctx->slot->data.database) ||
ctx->fast_forward || FilterByOrigin(ctx, origin_id))2. Check If this condition is true:
SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr)Then this means we are skipping because this has already
been decoded, then instead of returning, call a new function
ReorderBufferMarkPrepare() which will only update the flags in the txn
to indicate that the transaction is prepared
Then later in DecodeAbort or DecodeCommit, we can confirm
that the transaction has been Prepared by checking if the flag is set
and call ReorderBufferFinishPrepared appropriately.But then, thinking about this some more, I thought of a second approach.
Approach 2:
If the only purpose of all this was to differentiate between
Abort vs Rollback Prepared and Commit vs Commit Prepared, then we dont
need this. We already know the exact operation
in DecodeXactOp and can differentiate there. We only
overloaded DecodeAbort and DecodeCommit for convenience, we can always
call these functions with an extra flag to denote that we are either
commit or aborting a
previously prepared transaction and call
ReorderBufferFinishPrepared accordingly.
The second approach sounds better but you can see if there is not much
you want to reuse from DecodeCommit/DecodeAbort then you can even
write new functions DecodeCommitPrepared/DecodeAbortPrepared. OTOH, if
there is a common code among them then passing the flag would be a
better way.
--
With Regards,
Amit Kapila.
On Mon, Sep 28, 2020 at 1:13 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Wed, Sep 23, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
No problem. I think you can handle the other comments and then we can
come back to this and you might want to share the exact details of the
test (may be a narrow down version of the original test) and I or
someone else might be able to help you with that.--
With Regards,
Amit Kapila.I have added a new patch for supporting 2 phase commit semantics in
the streaming APIs for the logical decoding plugins. I have added 3
APIs
1. stream_prepare
2. stream_commit_prepared
3. stream_abort_preparedI have also added the support for the new APIs in test_decoding
plugin. I have not yet added it to pgoutpout.I have also added a fix for the error I saw while calling
ReorderBufferCleanupTXN as part of FinishPrepared handling. As a
result I have removed the function I added earlier,
ReorderBufferCleanupPreparedTXN.
Please have a look at the new changes and let me know what you think.I will continue to look at:
1. Remove snapshots on prepare truncate.
2. Bug seen while abort of prepared transaction, the prepared flag is
lost, and not able to make out that it was a previously prepared
transaction.
I have started looking into you latest patches, as of now I have a
few comments.
v6-0001
@@ -1987,7 +2072,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
break;
}
}
-
For streaming transaction we need to check the xid everytime because
there could concurrent a subtransaction abort, but
for two-phase we don't need to call SetupCheckXidLive everytime,
because we are sure that transaction is going to be
the same throughout the processing.
Apart from this I have also noticed a couple of cosmetic changes
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->enable_twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
One blank line after variable declations
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
+
ReorderBufferCleanupTXN(rb, txn);
Comment not aligned properly
v6-0003
+LookupGXact(const char *gid)
+{
+ int i;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
+
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
I think we should take LW_SHARED lowck here no?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Sep 29, 2020 at 8:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have started looking into you latest patches, as of now I have a
few comments.v6-0001
@@ -1987,7 +2072,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
prev_lsn = change->lsn;/* Set the current xid to detect concurrent aborts. */ - if (streaming) + if (streaming || rbtxn_prepared(change->txn)) { curtxn = change->txn; SetupCheckXidLive(curtxn->xid); @@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, break; } } -For streaming transaction we need to check the xid everytime because
there could concurrent a subtransaction abort, but
for two-phase we don't need to call SetupCheckXidLive everytime,
because we are sure that transaction is going to be
the same throughout the processing.
While decoding transactions at 'prepare' time there could be multiple
sub-transactions like in the case below. Won't that be impacted if we
follow your suggestion here?
postgres=# Begin;
BEGIN
postgres=*# insert into t1 values(1,'aaa');
INSERT 0 1
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# insert into t1 values(2,'aaa');
INSERT 0 1
postgres=*# savepoint s2;
SAVEPOINT
postgres=*# insert into t1 values(3,'aaa');
INSERT 0 1
postgres=*# Prepare Transaction 'foo';
PREPARE TRANSACTION
--
With Regards,
Amit Kapila.
On Wed, Sep 30, 2020 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Sep 29, 2020 at 8:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have started looking into you latest patches, as of now I have a
few comments.v6-0001
@@ -1987,7 +2072,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
prev_lsn = change->lsn;/* Set the current xid to detect concurrent aborts. */ - if (streaming) + if (streaming || rbtxn_prepared(change->txn)) { curtxn = change->txn; SetupCheckXidLive(curtxn->xid); @@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, break; } } -For streaming transaction we need to check the xid everytime because
there could concurrent a subtransaction abort, but
for two-phase we don't need to call SetupCheckXidLive everytime,
because we are sure that transaction is going to be
the same throughout the processing.While decoding transactions at 'prepare' time there could be multiple
sub-transactions like in the case below. Won't that be impacted if we
follow your suggestion here?postgres=# Begin;
BEGIN
postgres=*# insert into t1 values(1,'aaa');
INSERT 0 1
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# insert into t1 values(2,'aaa');
INSERT 0 1
postgres=*# savepoint s2;
SAVEPOINT
postgres=*# insert into t1 values(3,'aaa');
INSERT 0 1
postgres=*# Prepare Transaction 'foo';
PREPARE TRANSACTION
But once we prepare the transaction, we can not rollback individual
subtransaction. We can only rollback the main transaction so instead
of setting individual subxact as CheckXidLive, we can just set the
main XID so no need to check on every command. Just set it before
start processing.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Sep 30, 2020 at 2:46 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 30, 2020 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Sep 29, 2020 at 8:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have started looking into you latest patches, as of now I have a
few comments.v6-0001
@@ -1987,7 +2072,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
prev_lsn = change->lsn;/* Set the current xid to detect concurrent aborts. */ - if (streaming) + if (streaming || rbtxn_prepared(change->txn)) { curtxn = change->txn; SetupCheckXidLive(curtxn->xid); @@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, break; } } -For streaming transaction we need to check the xid everytime because
there could concurrent a subtransaction abort, but
for two-phase we don't need to call SetupCheckXidLive everytime,
because we are sure that transaction is going to be
the same throughout the processing.While decoding transactions at 'prepare' time there could be multiple
sub-transactions like in the case below. Won't that be impacted if we
follow your suggestion here?postgres=# Begin;
BEGIN
postgres=*# insert into t1 values(1,'aaa');
INSERT 0 1
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# insert into t1 values(2,'aaa');
INSERT 0 1
postgres=*# savepoint s2;
SAVEPOINT
postgres=*# insert into t1 values(3,'aaa');
INSERT 0 1
postgres=*# Prepare Transaction 'foo';
PREPARE TRANSACTIONBut once we prepare the transaction, we can not rollback individual
subtransaction.
Sure but Rollback can come before prepare like in the case below which
will appear as concurrent abort (assume there is some DDL which
changes the table before the Rollback statement) because it has
already been done by the backend and that need to be caught by this
mechanism only.
Begin;
insert into t1 values(1,'aaa');
savepoint s1;
insert into t1 values(2,'aaa');
savepoint s2;
insert into t1 values(3,'aaa');
Rollback to savepoint s2;
insert into t1 values(4,'aaa');
Prepare Transaction 'foo';
--
With Regards,
Amit Kapila.
On Wed, Sep 30, 2020 at 3:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 30, 2020 at 2:46 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 30, 2020 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Sep 29, 2020 at 8:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have started looking into you latest patches, as of now I have a
few comments.v6-0001
@@ -1987,7 +2072,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
prev_lsn = change->lsn;/* Set the current xid to detect concurrent aborts. */ - if (streaming) + if (streaming || rbtxn_prepared(change->txn)) { curtxn = change->txn; SetupCheckXidLive(curtxn->xid); @@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, break; } } -For streaming transaction we need to check the xid everytime because
there could concurrent a subtransaction abort, but
for two-phase we don't need to call SetupCheckXidLive everytime,
because we are sure that transaction is going to be
the same throughout the processing.While decoding transactions at 'prepare' time there could be multiple
sub-transactions like in the case below. Won't that be impacted if we
follow your suggestion here?postgres=# Begin;
BEGIN
postgres=*# insert into t1 values(1,'aaa');
INSERT 0 1
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# insert into t1 values(2,'aaa');
INSERT 0 1
postgres=*# savepoint s2;
SAVEPOINT
postgres=*# insert into t1 values(3,'aaa');
INSERT 0 1
postgres=*# Prepare Transaction 'foo';
PREPARE TRANSACTIONBut once we prepare the transaction, we can not rollback individual
subtransaction.Sure but Rollback can come before prepare like in the case below which
will appear as concurrent abort (assume there is some DDL which
changes the table before the Rollback statement) because it has
already been done by the backend and that need to be caught by this
mechanism only.Begin;
insert into t1 values(1,'aaa');
savepoint s1;
insert into t1 values(2,'aaa');
savepoint s2;
insert into t1 values(3,'aaa');
Rollback to savepoint s2;
insert into t1 values(4,'aaa');
Prepare Transaction 'foo';
If we are streaming on the prepare that means we must have decoded
that rollback WAL which means we should have removed the
ReorderBufferTXN for those subxact.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Sep 30, 2020 at 3:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 30, 2020 at 3:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 30, 2020 at 2:46 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 30, 2020 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Sep 29, 2020 at 8:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have started looking into you latest patches, as of now I have a
few comments.v6-0001
@@ -1987,7 +2072,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
prev_lsn = change->lsn;/* Set the current xid to detect concurrent aborts. */ - if (streaming) + if (streaming || rbtxn_prepared(change->txn)) { curtxn = change->txn; SetupCheckXidLive(curtxn->xid); @@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, break; } } -For streaming transaction we need to check the xid everytime because
there could concurrent a subtransaction abort, but
for two-phase we don't need to call SetupCheckXidLive everytime,
because we are sure that transaction is going to be
the same throughout the processing.While decoding transactions at 'prepare' time there could be multiple
sub-transactions like in the case below. Won't that be impacted if we
follow your suggestion here?postgres=# Begin;
BEGIN
postgres=*# insert into t1 values(1,'aaa');
INSERT 0 1
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# insert into t1 values(2,'aaa');
INSERT 0 1
postgres=*# savepoint s2;
SAVEPOINT
postgres=*# insert into t1 values(3,'aaa');
INSERT 0 1
postgres=*# Prepare Transaction 'foo';
PREPARE TRANSACTIONBut once we prepare the transaction, we can not rollback individual
subtransaction.Sure but Rollback can come before prepare like in the case below which
will appear as concurrent abort (assume there is some DDL which
changes the table before the Rollback statement) because it has
already been done by the backend and that need to be caught by this
mechanism only.Begin;
insert into t1 values(1,'aaa');
savepoint s1;
insert into t1 values(2,'aaa');
savepoint s2;
insert into t1 values(3,'aaa');
Rollback to savepoint s2;
insert into t1 values(4,'aaa');
Prepare Transaction 'foo';If we are streaming on the prepare that means we must have decoded
that rollback WAL which means we should have removed the
ReorderBufferTXN for those subxact.
Okay, valid point. We can avoid setting it for each sub-transaction in
that case but OTOH even if we allow to set it there shouldn't be any
bug.
--
With Regards,
Amit Kapila.
On Wed, Sep 30, 2020 at 3:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 30, 2020 at 3:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 30, 2020 at 3:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 30, 2020 at 2:46 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 30, 2020 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Sep 29, 2020 at 8:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have started looking into you latest patches, as of now I have a
few comments.v6-0001
@@ -1987,7 +2072,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
prev_lsn = change->lsn;/* Set the current xid to detect concurrent aborts. */ - if (streaming) + if (streaming || rbtxn_prepared(change->txn)) { curtxn = change->txn; SetupCheckXidLive(curtxn->xid); @@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, break; } } -For streaming transaction we need to check the xid everytime because
there could concurrent a subtransaction abort, but
for two-phase we don't need to call SetupCheckXidLive everytime,
because we are sure that transaction is going to be
the same throughout the processing.While decoding transactions at 'prepare' time there could be multiple
sub-transactions like in the case below. Won't that be impacted if we
follow your suggestion here?postgres=# Begin;
BEGIN
postgres=*# insert into t1 values(1,'aaa');
INSERT 0 1
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# insert into t1 values(2,'aaa');
INSERT 0 1
postgres=*# savepoint s2;
SAVEPOINT
postgres=*# insert into t1 values(3,'aaa');
INSERT 0 1
postgres=*# Prepare Transaction 'foo';
PREPARE TRANSACTIONBut once we prepare the transaction, we can not rollback individual
subtransaction.Sure but Rollback can come before prepare like in the case below which
will appear as concurrent abort (assume there is some DDL which
changes the table before the Rollback statement) because it has
already been done by the backend and that need to be caught by this
mechanism only.Begin;
insert into t1 values(1,'aaa');
savepoint s1;
insert into t1 values(2,'aaa');
savepoint s2;
insert into t1 values(3,'aaa');
Rollback to savepoint s2;
insert into t1 values(4,'aaa');
Prepare Transaction 'foo';If we are streaming on the prepare that means we must have decoded
that rollback WAL which means we should have removed the
ReorderBufferTXN for those subxact.Okay, valid point. We can avoid setting it for each sub-transaction in
that case but OTOH even if we allow to set it there shouldn't be any
bug.
Right, there will not be any bug, just an optimization.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Hello Ajin.
I have done some review of the v6 patches.
I had some difficulty replying my review comments to the OSS list, so
I am putting them as an attachment here.
Kind Regards,
Peter Smith
Fujitsu Australia
Attachments:
OSS-List-v6-review-comments-20201006.txttext/plain; charset=US-ASCII; name=OSS-List-v6-review-comments-20201006.txtDownload
Hello Ajin.
I have gone through the v6 patch changes and have a list of review
comments below.
Apologies for the length of this email - I know that many of the
following comments are trivial, but I figured I should either just
ignore everything cosmetic, or list everything regardless. I chose the
latter.
There may be some duplication where the same review comment is written
for multiple files and/or where the same file is in your multiple
patches.
Kind Regards.
Peter Smith
Fujitsu Australia
[BEGIN]
==========
Patch V6-0001, File: contrib/test_decoding/expected/prepared.out (so
prepared.sql also)
==========
COMMENT
Line 30 - The INSERT INTO test_prepared1 VALUES (2); is kind of
strange because it is not really part of the prior test nor the
following test. Maybe it would be better to have a comment describing
the purpose of this isolated INSERT and to also consume the data from
the slot so it does not get jumbled with the data of the following
(abort) test.
;
COMMENT
Line 53 - Same comment for this test INSERT INTO test_prepared1 VALUES
(4); It kind of has nothing really to do with either the prior (abort)
test nor the following (ddl) test.
;
COMMENT
Line 60 - Seems to check which locks are held for the test_prepared_1
table while the transaction is in progress. Maybe it would be better
to have more comments describing what is expected here and why.
;
COMMENT
Line 88 - There is a comment in the test saying "-- We should see '7'
before '5' in our results since it commits first." but I did not see
any test code that actually verifies that happens.
;
QUESTION
Line 120 - I did not really understand the SQL checking the pg_class.
I expected this would be checking table 'test_prepared1' instead. Can
you explain it?
SELECT 'pg_class' AS relation, locktype, mode
FROM pg_locks
WHERE locktype = 'relation'
AND relation = 'pg_class'::regclass;
relation | locktype | mode
----------+----------+------
(0 rows)
;
QUESTION
Line 139 - SET statement_timeout = '1s'; is 1 seconds short enough
here for this test, or might it be that these statements would be
completed in less than one seconds anyhow?
;
QUESTION
Line 163 - How is this testing a SAVEPOINT? Or is it only to check
that the SAVEPOINT command is not part of the replicated changes?
;
COMMENT
Line 175 - Missing underscore in comment. Code requires also underscore:
"nodecode" --> "_nodecode"
==========
Patch V6-0001, File: contrib/test_decoding/test_decoding.c
==========
COMMENT
Line 43
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;
The "check_xid" seems a meaningless name. Check what?
IIUC maybe should be something like "check_xid_aborted"
;
COMMENT
Line 105
@ -88,6 +93,19 @@ static void
pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
Remove extra blank line after these functions
;
COMMENT
Line 149
@@ -116,6 +134,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
+
}
There is a confusing mix of terminology where sometimes things are
referred as ROLLBACK/rollback and other times apparently the same
operation is referred as ABORT/abort. I do not know the root cause of
this mixture. IIUC maybe the internal functions and protocol generally
use the term "abort", whereas the SQL syntax is "ROLLBACK"... but
where those two terms collide in the middle it gets quite confusing.
At least I thought the names of the "callbacks" which get exposed to
the user (e.g. in the help) might be better if they would match the
SQL.
"abort_prepared_cb" --> "rollback_prepared_db"
There are similar review comments like this below where the
alternating terms caused me some confusion.
~
Also, Remove the extra blank line before the end of the function.
;
COMMENT
Line 267
@ -227,6 +252,42 @@ pg_decode_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
IMO the "check-xid" code might be better rearranged so the NULL check
is first instead of if/else.
e.g.
if (elem->arg == NULL)
ereport(FATAL,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("check-xid needs an input value")));
~
Also, is it really supposed to be FATAL instead or ERROR. That is not
the same as the other surrounding code.
;
COMMENT
Line 296
if (data->check_xid <= 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("Specify positive value for parameter \"%s\","
" you specified \"%s\"",
elem->defname, strVal(elem->arg))));
The code checking for <= 0 seems over-complicated. Because conversion
was using strtoul() I fail to see how this can ever be < 0. Wouldn't
it be easier to simply test the result of the strtoul() function?
BEFORE: if (errno == EINVAL || errno == ERANGE)
AFTER: if (data->check_xid == 0)
~
Also, should this be FATAL? Everything else similar is ERROR.
;
COMMENT
(general)
I don't recall seeing any of these decoding options (e.g.
"two-phase-commit", "check-xid") documented anywhere.
So how can a user even know these options exist so they can use them?
Perhaps options should be described on this page?
https://www.postgresql.org/docs/13/functions-admin.html#FUNCTIONS-REPLICATION
;
COMMENT
(general)
"check-xid" is a meaningless option name. Maybe something like
"checked-xid-aborted" is more useful?
Suggest changing the member, the option, and the error messages to
match some better name.
;
COMMENT
Line 314
@@ -238,6 +299,7 @@ pg_decode_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->enable_twophase &= enable_2pc;
}
The "ctx->enable_twophase" is inconsistent naming with the
"ctx->streaming" member.
"enable_twophase" --> "twophase"
;
COMMENT
Line 374
@@ -297,6 +359,94 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Remove the extra preceding blank line.
~
I did not find anything in the help about "_nodecode". Should it be
there or is this deliberately not documented feature?
;
QUESTION
Line 440
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Is this a wrong comment
"ABORT PREPARED" --> "ROLLBACK PREPARED" ??
;
COMMENT
Line 620
@@ -455,6 +605,22 @@ pg_decode_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid is specified */
+ if (TransactionIdIsValid(data->check_xid))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid);
+ while (TransactionIdIsInProgress(dat
The check_xid seems a meaningless name, and the comment "/* if
check_xid is specified */" was not helpful either.
IIUC purpose of this is to check that the nominated xid always is rolled back.
So the appropriate name may be more like "check-xid-aborted".
;
==========
Patch V6-0001, File: doc/src/sgml/logicaldecoding.sgml
==========
COMMENT/QUESTION
Section 48.6.1
@ -387,6 +387,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
Confused by the mixing of terminologies "abort" and "rollback".
Why is it LogicalDecodeAbortPreparedCB instead of
LogicalDecodeRollbackPreparedCB?
Why is it abort_prepared_cb instead of rollback_prepared_cb;?
I thought everything the user sees should be ROLLBACK/rollback (like
the SQL) regardless of what the internal functions might be called.
;
COMMENT
Section 48.6.1
The begin_cb, change_cb and commit_cb callbacks are required, while
startup_cb, filter_by_origin_cb, truncate_cb, and shutdown_cb are
optional. If truncate_cb is not set but a TRUNCATE is to be decoded,
the action will be ignored.
The 1st paragraph beneath the typedef does not mention the newly added
callbacks to say if they are required or optional.
;
COMMENT
Section 48.6.4.5
Section 48.6.4.6
Section 48.6.4.7
@@ -578,6 +588,55 @@ typedef void (*LogicalDecodeCommitCB) (struct
LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+<programlisting>
The wording and titles are a bit backwards compared to the others.
e.g. previously was "Transaction Begin" (not "Begin Transaction") and
"Transaction End" (not "End Transaction").
So for consistently following the existing IMO should change these new
titles (and wording) to:
- "Commit Prepared Transaction Callback" --> "Transaction Commit
Prepared Callback"
- "Rollback Prepared Transaction Callback" --> "Transaction Rollback
Prepared Callback"
- "whenever a commit prepared transaction has been decoded" -->
"whenever a transaction commit prepared has been decoded"
- "whenever a rollback prepared transaction has been decoded." -->
"whenever a transaction rollback prepared has been decoded."
;
==========
Patch V6-0001, File: src/backend/replication/logical/decode.c
==========
COMMENT
Line 74
@@ -70,6 +70,9 @@ static void DecodeCommit(LogicalDecodingContext
*ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
The 2nd line of DecodePrepare is misaligned by one space.
;
COMMENT
Line 321
@@ -312,17 +315,34 @@ DecodeXactOp(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
+ /* check that output plugin is capable of twophase decoding */
"twophase" --> "two-phase"
~
Also, add a blank line after the declarations.
;
==========
Patch V6-0001, File: src/backend/replication/logical/logical.c
==========
COMMENT
Line 249
@@ -225,6 +237,19 @@ StartupDecodingContext(List *output_plugin_options,
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);
+ /*
+ * To support two phase logical decoding, we require
prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however
enable two phase logical
+ * decoding when at least one of the methods is enabled so that we
can easily identify
+ * missing methods.
The terminology is generally well known as "two-phase" (with the
hyphen) https://en.wikipedia.org/wiki/Two-phase_commit_protocol so
let's be consistent for all the patch code comments. Please search the
code and correct this in all places, even where I might have missed to
identify it.
"two phase" --> "two-phase"
;
COMMENT
Line 822
@@ -782,6 +807,111 @@ commit_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
"support 2 phase" --> "supports two-phase" in the comment
;
COMMENT
Line 844
Code condition seems strange and/or broken.
if (ctx->enable_twophase && ctx->callbacks.prepare_cb == NULL)
Because if the flag is null then this condition is skipped.
But then if the callback was also NULL then attempting to call it to
"do the actual work" will give NPE.
~
Also, I wonder should this check be the first thing in this function?
Because if it fails does it even make sense that all the errcallback
code was set up?
E.g errcallback.arg potentially is left pointing to a stack variable
on a stack that no longer exists.
;
COMMENT
Line 857
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
"support 2 phase" --> "supports two-phase" in the comment
~
Also, Same potential trouble with the condition:
if (ctx->enable_twophase && ctx->callbacks.commit_prepared_cb == NULL)
Same as previously asked. Should this check be first thing in this function?
;
COMMENT
Line 892
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
"support 2 phase" --> "supports two-phase" in the comment
~
Same potential trouble with the condition:
if (ctx->enable_twophase && ctx->callbacks.abort_prepared_cb == NULL)
Same as previously asked. Should this check be the first thing in this function?
;
COMMENT
Line 1013
@@ -858,6 +988,51 @@ truncate_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
Fix wording in comment:
"twophase" --> "two-phase transactions"
"twophase transactions" --> "two-phase transactions"
==========
Patch V6-0001, File: src/backend/replication/logical/reorderbuffer.c
==========
COMMENT
Line 255
@@ -251,7 +251,8 @@ static Size
ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb,
ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb,
ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
The alignment is inconsistent. One more space needed before "bool txn_prepared"
;
COMMENT
Line 417
@@ -413,6 +414,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
Should add the blank link before this new code, as it was before.
;
COMMENT
Line 1564
@ -1502,12 +1561,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either
after streaming or
+ * after a PREPARE.
typo "snapshots.If" -> "snapshots. If"
;
COMMENT/QUESTION
Line 1590
@@ -1526,7 +1587,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
There are some code paths here I did not understand how they match the comments.
Because this function is recursive it seems that it may be called
where the 2nd parameter txn is a sub-transaction.
But then this seems at odds with some of the other code comments of
this function which are processing the txn without ever testing is it
really toplevel or not:
e.g. Line 1593 "/* cleanup changes in the toplevel txn */"
e.g. Line 1632 "They are always stored in the toplevel transaction."
;
COMMENT
Line 1644
@@ -1560,9 +1621,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
/* remove the change from it's containing list */
typo "it's" --> "its"
;
QUESTION
Line 1977
@@ -1880,7 +1965,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
ReorderBufferChange *specinsert)
{
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
How do you know the 3rd parameter - i.e. txn_prepared - should be
hardwired false here?
e.g. I thought that maybe rbtxn_prepared(txn) can be true here.
;
COMMENT
Line 2345
@@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
break;
}
}
-
/*
Looks like accidental blank line deletion. This should be put back how it was
;
COMMENT/QUESTION
Line 2374
@@ -2278,7 +2362,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
"twophase" --> "two-phase"
~
Also, I was confused by the apparent assumption of exclusiveness of
streaming and 2PC...
e.g. what if streaming AND 2PC then it won't do rb->prepare()
;
QUESTION
Line 2424
@@ -2319,11 +2412,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
I was confused by the exclusiveness of streaming/2PC.
e.g. what if streaming AND 2PC at same time - how can you pass false
as 3rd param to ReorderBufferTruncateTXN?
;
COMMENT
Line 2463
@@ -2352,17 +2451,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We
need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
"twoi phase" --> "two-phase"
;
QUESTIONS
Line 2482
@@ -2370,10 +2470,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream
remaining data. */
+ if (streaming)
Re: /* If streaming, reset the TXN so that it is allowed to stream
remaining data. */
I was confused by the exclusiveness of streaming/2PC.
Is it not possible for streaming flags and rbtxn_prepared(txn) true at
the same time?
~
elog(LOG, "stopping decoding of %s (%u)",
txn->gid[0] != '\0'? txn->gid:"", txn->xid);
Is this a safe operation, or do you also need to test txn->gid is not NULL?
;
COMMENT
Line 2606
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
"twophase" --> "two-phase"
;
QUESTION
Line 2655
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
"This is used to handle COMMIT/ABORT PREPARED"
Should that say "COMMIT/ROLLBACK PREPARED"?
;
COMMENT
Line 2668
"Anyways, 2PC transactions" --> "Anyway, two-phase transactions"
;
COMMENT
Line 2765
@@ -2495,7 +2731,13 @@ ReorderBufferAbort(ReorderBuffer *rb,
TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
Remove the blank between the comment and code.
==========
Patch V6-0001, File: src/include/replication/logical.h
==========
COMMENT
Line 89
"two phase" -> "two-phase"
;
COMMENT
Line 89
For consistency with the previous member naming really the new member
should just be called "twophase" rather than "enable_twophase"
;
==========
Patch V6-0001, File: src/include/replication/output_plugin.h
==========
QUESTION
Line 106
As previously asked, why is the callback function/typedef referred as
AbortPrepared instead of RollbackPrepared?
It does not match the SQL and the function comment, and seems only to
add some unnecessary confusion.
;
==========
Patch V6-0001, File: src/include/replication/reorderbuffer.h
==========
QUESTION
Line 116
@@ -162,9 +163,13 @@ typedef struct ReorderBufferChange
#define RBTXN_HAS_CATALOG_CHANGES 0x0001
#define RBTXN_IS_SUBXACT 0x0002
#define RBTXN_IS_SERIALIZED 0x0004
-#define RBTXN_IS_STREAMED 0x0008
-#define RBTXN_HAS_TOAST_INSERT 0x0010
-#define RBTXN_HAS_SPEC_INSERT 0x0020
+#define RBTXN_PREPARE 0x0008
+#define RBTXN_COMMIT_PREPARED 0x0010
+#define RBTXN_ROLLBACK_PREPARED 0x0020
+#define RBTXN_COMMIT 0x0040
+#define RBTXN_IS_STREAMED 0x0080
+#define RBTXN_HAS_TOAST_INSERT 0x0100
+#define RBTXN_HAS_SPEC_INSERT 0x0200
I was wondering why when adding new flags, some of the existing flag
masks were also altered.
I am assuming this is ok because they are never persisted but are only
used in the protocol (??)
;
COMMENT
Line 226
@@ -218,6 +223,15 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+/* was this txn committed in the meanwhile? */
+#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT)
+
Probably all the "txn->txn_flags" here might be more safely written
with parentheses in the macro like "(txn)->txn_flags".
~
Also, Start all comments with capital. And what is the meaning "in the
meanwhile?"
;
COMMENT
Line 410
@@ -390,6 +407,39 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
The format is inconsistent with all other callback signatures here,
where the 1st arg was on the same line as the typedef.
;
COMMENT
Line 440-442
Excessive blank lines following this change?
;
COMMENT
Line 638
@@ -571,6 +631,15 @@ void
ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid,
XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
Not aligned consistently with other function prototypes.
;
==========
Patch V6-0003, File: src/backend/access/transam/twophase.c
==========
COMMENT
Line 551
@@ -548,6 +548,37 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
There is potential to refactor/simplify this code:
e.g.
bool
LookupGXact(const char *gid)
{
int i;
bool found = false;
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
{
GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
/* Ignore not-yet-valid GIDs */
if (gxact->valid && strcmp(gxact->gid, gid) == 0)
{
found = true;
break;
}
}
LWLockRelease(TwoPhaseStateLock);
return found;
}
;
==========
Patch V6-0003, File: src/backend/replication/logical/proto.c
==========
COMMENT
Line 86
@@ -72,12 +72,17 @@ logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data)
*/
void
logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
Since now the flags are used the code comment is wrong.
"/* send the flags field (unused for now) */"
;
COMMENT
Line 129
@ -106,6 +115,77 @@ logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
"2PC transactions" --> "two-phase commit transactions"
;
COMMENT
Line 133
Assert(strlen(txn->gid) > 0);
Shouldn't that assertion also check txn->gid is not NULL (to prevent
NPE in case gid was NULL)
;
COMMENT
Line 177
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
prepare_data->prepare_type = flags;
This code may be OK but it does seem a bit of an abuse of the flags.
e.g. Are they flags or are the really enum values?
e.g. And if they are effectively enums (it appears they are) then
seemed inconsistent that |= was used when they were previously
assigned.
;
==========
Patch V6-0003, File: src/backend/replication/logical/worker.c
==========
COMMENT
Line 757
@@ -749,6 +753,141 @@ apply_handle_commit(StringInfo s)
pgstat_report_activity(STATE_IDLE, NULL);
}
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
Missing function comment to say this is called from apply_handle_prepare.
;
COMMENT
Line 798
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
Missing function comment to say this is called from apply_handle_prepare.
;
COMMENT
Line 824
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
Missing function comment to say this is called from apply_handle_prepare.
==========
Patch V6-0003, File: src/backend/replication/pgoutput/pgoutput.c
==========
COMMENT
Line 50
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
The parameter indentation (2nd lines) does not match everything else
in this context.
;
COMMENT
Line 152
@@ -143,6 +149,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->abort_prepared_cb = pgoutput_abort_prepared_txn;
Remove the unnecessary blank line.
;
QUESTION
Line 386
@@ -373,7 +383,49 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
OutputPluginUpdateProgress(ctx);
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit(ctx->out, txn, commit_lsn, true);
Is the is_commit parameter of logicalrep_write_commit ever passed as false?
If yes, where?
If no, the what is the point of it?
;
COMMENT
Line 408
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Since all this function is identical to pg_output_prepare it might be
better to either
1. just leave this as a wrapper to delegate to that function
2. remove this one entirely and assign the callback to the common
pgoutput_prepare_txn
;
COMMENT
Line 419
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Since all this function is identical to pg_output_prepare if might be
better to either
1. just leave this as a wrapper to delegate to that function
2. remove this one entirely and assign the callback to the common
pgoutput_prepare_tx
;
COMMENT
Line 419
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Shouldn't this comment say be "ROLLBACK PREPARED"?
;
==========
Patch V6-0003, File: src/include/replication/logicalproto.h
==========
QUESTION
Line 101
@@ -87,20 +87,55 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
#define LOGICALREP_IS_ABORT 0x02
Is there a good reason why this is not called:
#define LOGICALREP_IS_ROLLBACK 0x02
;
COMMENT
Line 105
((flags == LOGICALREP_IS_COMMIT) || (flags == LOGICALREP_IS_ABORT))
Macros would be safer if flags are in parentheses
(((flags) == LOGICALREP_IS_COMMIT) || ((flags) == LOGICALREP_IS_ABORT))
;
COMMENT
Line 115
Unexpected whitespace for the typedef
"} LogicalRepPrepareData;"
;
COMMENT
Line 122
/* prepare can be exactly one of PREPARE, [COMMIT|ABORT] PREPARED*/
#define PrepareFlagsAreValid(flags) \
((flags == LOGICALREP_IS_PREPARE) || \
(flags == LOGICALREP_IS_COMMIT_PREPARED) || \
(flags == LOGICALREP_IS_ROLLBACK_PREPARED))
There is confusing mixture in macros and comments of ABORT and ROLLBACK terms
"[COMMIT|ABORT] PREPARED" --> "[COMMIT|ROLLBACK] PREPARED"
~
Also, it would be safer if flags are in parentheses
(((flags) == LOGICALREP_IS_PREPARE) || \
((flags) == LOGICALREP_IS_COMMIT_PREPARED) || \
((flags) == LOGICALREP_IS_ROLLBACK_PREPARED))
;
==========
Patch V6-0003, File: src/test/subscription/t/020_twophase.pl
==========
COMMENT
Line 131 - # check inserts are visible
Isn't this supposed to be checking for rows 12 and 13, instead of 11 and 12?
;
==========
Patch V6-0004, File: contrib/test_decoding/test_decoding.c
==========
COMMENT
Line 81
@@ -78,6 +78,15 @@ static void
pg_decode_stream_stop(LogicalDecodingContext *ctx,
static void pg_decode_stream_abort(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static
All these functions have a 3rd parameter called commit_lsn. Even
though the functions are not commit related. It seems like a cut/paste
error.
;
COMMENT
Line 142
@@ -130,6 +139,9 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_start_cb = pg_decode_stream_start;
cb->stream_stop_cb = pg_decode_stream_stop;
cb->stream_abort_cb = pg_decode_stream_abort;
+ cb->stream_prepare_cb = pg_decode_stream_prepare;
+ cb->stream_commit_prepared_cb = pg_decode_stream_commit_prepared;
+ cb->stream_abort_prepared_cb = pg_decode_stream_abort_prepared;
cb->stream_commit_cb = pg_decode_stream_commit;
Can the "cb->stream_abort_prepared_cb" be changed to
"cb->stream_rollback_prepared_cb"?
;
COMMENT
Line 827
@@ -812,6 +824,78 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}
static void
+pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_pr
The commit_lsn (3rd parameter) is unused and seems like a cut/paste name error.
;
COMMENT
Line 875
+pg_decode_stream_abort_prepared(LogicalDecodingContext *ctx,
The commit_lsn (3rd parameter) is unused and seems like a cut/paste name error.
;
==========
Patch V6-0004, File: doc/src/sgml/logicaldecoding.sgml
==========
COMMENT
48.6.1
@@ -396,6 +396,9 @@ typedef struct OutputPluginCallbacks
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamAbortPreparedCB stream_abort_prepared_cb;
Same question from previous review comments - why using the
terminology "abort" instead of "rollback"
;
COMMENT
48.6.1
@@ -418,7 +421,9 @@ typedef void (*LogicalOutputPluginInit) (struct
OutputPluginCallbacks *cb);
in-progress transactions. The <function>stream_start_cb</function>,
<function>stream_stop_cb</function>, <function>stream_abort_cb</function>,
<function>stream_commit_cb</function> and <function>stream_change_cb</function>
- are required, while <function>stream_message_cb</function> and
+ are required, while <function>stream_message_cb</function>,
+ <function>stream_prepare_cb</function>,
<function>stream_commit_prepared_cb</function>,
+ <function>stream_abort_prepared_cb</function>,
Missing "and".
... "stream_abort_prepared_cb, stream_truncate_cb are optional." -->
"stream_abort_prepared_cb, and stream_truncate_cb are optional."
;
COMMENT
Section 48.6.4.16
Section 48.6.4.17
Section 48.6.4.18
@@ -839,6 +844,45 @@ typedef void (*LogicalDecodeStreamAbortCB)
(struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-stream-prepare">
+ <title>Stream Prepare Callback</title>
+ <para>
+ The <function>stream_prepare_cb</function> callback is called to prepare
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamPrepareCB) (struct
LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-commit-prepared">
+ <title>Stream Commit Prepared Callback</title>
+ <para>
+ The <function>stream_commit_prepared_cb</function> callback is
called to commit prepared
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct
LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-abort-prepared">
+ <title>Stream Abort Prepared Callback</title>
+ <para>
+ The <function>stream_abort_prepared_cb</function> callback is called
to abort prepared
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamAbortPreparedCB) (struct
LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
1. Everywhere it says "two phase" commit should be consistently
replaced to say "two-phase" commit (with the hyphen)
2. Search for "abort_lsn" parameter. It seems to be overused
(cut/paste error) even when the API is unrelated to abort
3. 48.6.4.17 and 48.6.4.18
Is this wording ok? Is the word "prepared" even necessary here?
- "... called to commit prepared a previously streamed transaction ..."
- "... called to abort prepared a previously streamed transaction ..."
;
COMMENT
Section 48.9
@@ -1017,9 +1061,13 @@ OutputPluginWrite(ctx, true);
When streaming an in-progress transaction, the changes (and messages) are
streamed in blocks demarcated by <function>stream_start_cb</function>
and <function>stream_stop_cb</function> callbacks. Once all the decoded
- changes are transmitted, the transaction is committed using the
- <function>stream_commit_cb</function> callback (or possibly aborted using
- the <function>stream_abort_cb</function> callback).
+ changes are transmitted, the transaction can be committed using the
+ the <function>stream_commit_cb</function> callback
"two phase" --> "two-phase"
~
Also, Missing period on end of sentence.
"or aborted using the stream_abort_prepared_cb" --> "or aborted using
the stream_abort_prepared_cb."
;
==========
Patch V6-0004, File: src/backend/replication/logical/logical.c
==========
COMMENT
Line 84
@@ -81,6 +81,12 @@ static void stream_stop_cb_wrapper(ReorderBuffer
*cache, ReorderBufferTXN *txn,
XLogRecPtr last_lsn);
static void stream_abort_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void stream_prepare_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_commit_prepared_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_abort_prepared_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
The 3rd parameter is always "commit_lsn" even for API unrelated to
commit, so seems like cut/paste error.
;
COMMENT
Line 1246
@@ -1231,6 +1243,105 @@ stream_abort_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
}
static void
+stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
Misnamed parameter "commit_lsn" ?
~
Also, Line 1272
There seem to be some missing integrity checking to make sure the
callback is not NULL.
A null callback will give NPE when wrapper attempts to call it
;
COMMENT
Line 1305
+static void
+stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
There seem to be some missing integrity checking to make sure the
callback is not NULL.
A null callback will give NPE when wrapper attempts to call it.
;
COMMENT
Line 1312
+static void
+stream_abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Misnamed parameter "commit_lsn" ?
~
Also, Line 1338
There seem to be some missing integrity checking to make sure the
callback is not NULL.
A null callback will give NPE when wrapper attempts to call it.
==========
Patch V6-0004, File: src/backend/replication/logical/reorderbuffer.c
==========
COMMENT
Line 2684
@@ -2672,15 +2681,31 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb,
TransactionId xid,
txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
strcpy(txn->gid, gid);
- if (is_commit)
+ if (rbtxn_is_streamed(txn))
{
- txn->txn_flags |= RBTXN_COMMIT_PREPARED;
- rb->commit_prepared(rb, txn, commit_lsn);
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
The setting/checking of the flags could be refactored if you wanted to
write less code:
e.g.
if (is_commit)
txn->txn_flags |= RBTXN_COMMIT_PREPARED;
else
txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
if (rbtxn_is_streamed(txn) && rbtxn_commit_prepared(txn))
rb->stream_commit_prepared(rb, txn, commit_lsn);
else if (rbtxn_is_streamed(txn) && rbtxn_rollback_prepared(txn))
rb->stream_abort_prepared(rb, txn, commit_lsn);
else if (rbtxn_commit_prepared(txn))
rb->commit_prepared(rb, txn, commit_lsn);
else if (rbtxn_rollback_prepared(txn))
rb->abort_prepared(rb, txn, commit_lsn);
;
==========
Patch V6-0004, File: src/include/replication/output_plugin.h
==========
COMMENT
Line 171
@@ -157,6 +157,33 @@ typedef void (*LogicalDecodeStreamAbortCB)
(struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);
/*
+ * Called to prepare changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit and only when
+ * two-phased commits are supported
+ */
1. Missing period all these comments.
2. Is the part that says "and only where two-phased commits are
supported" necessary to say? Is seems redundant since comments already
says called as part of a two-phase commit.
;
==========
Patch V6-0004, File: src/include/replication/reorderbuffer.h
==========
COMMENT
Line 467
@@ -466,6 +466,24 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
Cut/paste error - repeated same comment 3 times?
[END]Hello Ajin.
I have gone through the v6 patch changes and have a list of review
comments below.
Apologies for the length of this email - I know that many of the
following comments are trivial, but I figured I should either just
ignore everything cosmetic, or list everything regardless. I chose the
latter.
There may be some duplication where the same review comment is written
for multiple files and/or where the same file is in your multiple
patches.
Kind Regards.
Peter Smith
Fujitsu Australia
[BEGIN]
==========
Patch V6-0001, File: contrib/test_decoding/expected/prepared.out (so
prepared.sql also)
==========
COMMENT
Line 30 - The INSERT INTO test_prepared1 VALUES (2); is kind of
strange because it is not really part of the prior test nor the
following test. Maybe it would be better to have a comment describing
the purpose of this isolated INSERT and to also consume the data from
the slot so it does not get jumbled with the data of the following
(abort) test.
;
COMMENT
Line 53 - Same comment for this test INSERT INTO test_prepared1 VALUES
(4); It kind of has nothing really to do with either the prior (abort)
test nor the following (ddl) test.
;
COMMENT
Line 60 - Seems to check which locks are held for the test_prepared_1
table while the transaction is in progress. Maybe it would be better
to have more comments describing what is expected here and why.
;
COMMENT
Line 88 - There is a comment in the test saying "-- We should see '7'
before '5' in our results since it commits first." but I did not see
any test code that actually verifies that happens.
;
QUESTION
Line 120 - I did not really understand the SQL checking the pg_class.
I expected this would be checking table 'test_prepared1' instead. Can
you explain it?
SELECT 'pg_class' AS relation, locktype, mode
FROM pg_locks
WHERE locktype = 'relation'
AND relation = 'pg_class'::regclass;
relation | locktype | mode
----------+----------+------
(0 rows)
;
QUESTION
Line 139 - SET statement_timeout = '1s'; is 1 seconds short enough
here for this test, or might it be that these statements would be
completed in less than one seconds anyhow?
;
QUESTION
Line 163 - How is this testing a SAVEPOINT? Or is it only to check
that the SAVEPOINT command is not part of the replicated changes?
;
COMMENT
Line 175 - Missing underscore in comment. Code requires also underscore:
"nodecode" --> "_nodecode"
==========
Patch V6-0001, File: contrib/test_decoding/test_decoding.c
==========
COMMENT
Line 43
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;
The "check_xid" seems a meaningless name. Check what?
IIUC maybe should be something like "check_xid_aborted"
;
COMMENT
Line 105
@ -88,6 +93,19 @@ static void
pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
Remove extra blank line after these functions
;
COMMENT
Line 149
@@ -116,6 +134,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
+
}
There is a confusing mix of terminology where sometimes things are
referred as ROLLBACK/rollback and other times apparently the same
operation is referred as ABORT/abort. I do not know the root cause of
this mixture. IIUC maybe the internal functions and protocol generally
use the term "abort", whereas the SQL syntax is "ROLLBACK"... but
where those two terms collide in the middle it gets quite confusing.
At least I thought the names of the "callbacks" which get exposed to
the user (e.g. in the help) might be better if they would match the
SQL.
"abort_prepared_cb" --> "rollback_prepared_db"
There are similar review comments like this below where the
alternating terms caused me some confusion.
~
Also, Remove the extra blank line before the end of the function.
;
COMMENT
Line 267
@ -227,6 +252,42 @@ pg_decode_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
IMO the "check-xid" code might be better rearranged so the NULL check
is first instead of if/else.
e.g.
if (elem->arg == NULL)
ereport(FATAL,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("check-xid needs an input value")));
~
Also, is it really supposed to be FATAL instead or ERROR. That is not
the same as the other surrounding code.
;
COMMENT
Line 296
if (data->check_xid <= 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("Specify positive value for parameter \"%s\","
" you specified \"%s\"",
elem->defname, strVal(elem->arg))));
The code checking for <= 0 seems over-complicated. Because conversion
was using strtoul() I fail to see how this can ever be < 0. Wouldn't
it be easier to simply test the result of the strtoul() function?
BEFORE: if (errno == EINVAL || errno == ERANGE)
AFTER: if (data->check_xid == 0)
~
Also, should this be FATAL? Everything else similar is ERROR.
;
COMMENT
(general)
I don't recall seeing any of these decoding options (e.g.
"two-phase-commit", "check-xid") documented anywhere.
So how can a user even know these options exist so they can use them?
Perhaps options should be described on this page?
https://www.postgresql.org/docs/13/functions-admin.html#FUNCTIONS-REPLICATION
;
COMMENT
(general)
"check-xid" is a meaningless option name. Maybe something like
"checked-xid-aborted" is more useful?
Suggest changing the member, the option, and the error messages to
match some better name.
;
COMMENT
Line 314
@@ -238,6 +299,7 @@ pg_decode_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->enable_twophase &= enable_2pc;
}
The "ctx->enable_twophase" is inconsistent naming with the
"ctx->streaming" member.
"enable_twophase" --> "twophase"
;
COMMENT
Line 374
@@ -297,6 +359,94 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Remove the extra preceding blank line.
~
I did not find anything in the help about "_nodecode". Should it be
there or is this deliberately not documented feature?
;
QUESTION
Line 440
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Is this a wrong comment
"ABORT PREPARED" --> "ROLLBACK PREPARED" ??
;
COMMENT
Line 620
@@ -455,6 +605,22 @@ pg_decode_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid is specified */
+ if (TransactionIdIsValid(data->check_xid))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid);
+ while (TransactionIdIsInProgress(dat
The check_xid seems a meaningless name, and the comment "/* if
check_xid is specified */" was not helpful either.
IIUC purpose of this is to check that the nominated xid always is rolled back.
So the appropriate name may be more like "check-xid-aborted".
;
==========
Patch V6-0001, File: doc/src/sgml/logicaldecoding.sgml
==========
COMMENT/QUESTION
Section 48.6.1
@ -387,6 +387,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
Confused by the mixing of terminologies "abort" and "rollback".
Why is it LogicalDecodeAbortPreparedCB instead of
LogicalDecodeRollbackPreparedCB?
Why is it abort_prepared_cb instead of rollback_prepared_cb;?
I thought everything the user sees should be ROLLBACK/rollback (like
the SQL) regardless of what the internal functions might be called.
;
COMMENT
Section 48.6.1
The begin_cb, change_cb and commit_cb callbacks are required, while
startup_cb, filter_by_origin_cb, truncate_cb, and shutdown_cb are
optional. If truncate_cb is not set but a TRUNCATE is to be decoded,
the action will be ignored.
The 1st paragraph beneath the typedef does not mention the newly added
callbacks to say if they are required or optional.
;
COMMENT
Section 48.6.4.5
Section 48.6.4.6
Section 48.6.4.7
@@ -578,6 +588,55 @@ typedef void (*LogicalDecodeCommitCB) (struct
LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+<programlisting>
The wording and titles are a bit backwards compared to the others.
e.g. previously was "Transaction Begin" (not "Begin Transaction") and
"Transaction End" (not "End Transaction").
So for consistently following the existing IMO should change these new
titles (and wording) to:
- "Commit Prepared Transaction Callback" --> "Transaction Commit
Prepared Callback"
- "Rollback Prepared Transaction Callback" --> "Transaction Rollback
Prepared Callback"
- "whenever a commit prepared transaction has been decoded" -->
"whenever a transaction commit prepared has been decoded"
- "whenever a rollback prepared transaction has been decoded." -->
"whenever a transaction rollback prepared has been decoded."
;
==========
Patch V6-0001, File: src/backend/replication/logical/decode.c
==========
COMMENT
Line 74
@@ -70,6 +70,9 @@ static void DecodeCommit(LogicalDecodingContext
*ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
The 2nd line of DecodePrepare is misaligned by one space.
;
COMMENT
Line 321
@@ -312,17 +315,34 @@ DecodeXactOp(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
+ /* check that output plugin is capable of twophase decoding */
"twophase" --> "two-phase"
~
Also, add a blank line after the declarations.
;
==========
Patch V6-0001, File: src/backend/replication/logical/logical.c
==========
COMMENT
Line 249
@@ -225,6 +237,19 @@ StartupDecodingContext(List *output_plugin_options,
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);
+ /*
+ * To support two phase logical decoding, we require
prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however
enable two phase logical
+ * decoding when at least one of the methods is enabled so that we
can easily identify
+ * missing methods.
The terminology is generally well known as "two-phase" (with the
hyphen) https://en.wikipedia.org/wiki/Two-phase_commit_protocol so
let's be consistent for all the patch code comments. Please search the
code and correct this in all places, even where I might have missed to
identify it.
"two phase" --> "two-phase"
;
COMMENT
Line 822
@@ -782,6 +807,111 @@ commit_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
"support 2 phase" --> "supports two-phase" in the comment
;
COMMENT
Line 844
Code condition seems strange and/or broken.
if (ctx->enable_twophase && ctx->callbacks.prepare_cb == NULL)
Because if the flag is null then this condition is skipped.
But then if the callback was also NULL then attempting to call it to
"do the actual work" will give NPE.
~
Also, I wonder should this check be the first thing in this function?
Because if it fails does it even make sense that all the errcallback
code was set up?
E.g errcallback.arg potentially is left pointing to a stack variable
on a stack that no longer exists.
;
COMMENT
Line 857
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
"support 2 phase" --> "supports two-phase" in the comment
~
Also, Same potential trouble with the condition:
if (ctx->enable_twophase && ctx->callbacks.commit_prepared_cb == NULL)
Same as previously asked. Should this check be first thing in this function?
;
COMMENT
Line 892
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
"support 2 phase" --> "supports two-phase" in the comment
~
Same potential trouble with the condition:
if (ctx->enable_twophase && ctx->callbacks.abort_prepared_cb == NULL)
Same as previously asked. Should this check be the first thing in this function?
;
COMMENT
Line 1013
@@ -858,6 +988,51 @@ truncate_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
Fix wording in comment:
"twophase" --> "two-phase transactions"
"twophase transactions" --> "two-phase transactions"
==========
Patch V6-0001, File: src/backend/replication/logical/reorderbuffer.c
==========
COMMENT
Line 255
@@ -251,7 +251,8 @@ static Size
ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb,
ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb,
ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
The alignment is inconsistent. One more space needed before "bool txn_prepared"
;
COMMENT
Line 417
@@ -413,6 +414,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
Should add the blank link before this new code, as it was before.
;
COMMENT
Line 1564
@ -1502,12 +1561,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either
after streaming or
+ * after a PREPARE.
typo "snapshots.If" -> "snapshots. If"
;
COMMENT/QUESTION
Line 1590
@@ -1526,7 +1587,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
There are some code paths here I did not understand how they match the comments.
Because this function is recursive it seems that it may be called
where the 2nd parameter txn is a sub-transaction.
But then this seems at odds with some of the other code comments of
this function which are processing the txn without ever testing is it
really toplevel or not:
e.g. Line 1593 "/* cleanup changes in the toplevel txn */"
e.g. Line 1632 "They are always stored in the toplevel transaction."
;
COMMENT
Line 1644
@@ -1560,9 +1621,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
/* remove the change from it's containing list */
typo "it's" --> "its"
;
QUESTION
Line 1977
@@ -1880,7 +1965,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
ReorderBufferChange *specinsert)
{
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
How do you know the 3rd parameter - i.e. txn_prepared - should be
hardwired false here?
e.g. I thought that maybe rbtxn_prepared(txn) can be true here.
;
COMMENT
Line 2345
@@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
break;
}
}
-
/*
Looks like accidental blank line deletion. This should be put back how it was
;
COMMENT/QUESTION
Line 2374
@@ -2278,7 +2362,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
"twophase" --> "two-phase"
~
Also, I was confused by the apparent assumption of exclusiveness of
streaming and 2PC...
e.g. what if streaming AND 2PC then it won't do rb->prepare()
;
QUESTION
Line 2424
@@ -2319,11 +2412,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
I was confused by the exclusiveness of streaming/2PC.
e.g. what if streaming AND 2PC at same time - how can you pass false
as 3rd param to ReorderBufferTruncateTXN?
;
COMMENT
Line 2463
@@ -2352,17 +2451,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We
need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
"twoi phase" --> "two-phase"
;
QUESTIONS
Line 2482
@@ -2370,10 +2470,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream
remaining data. */
+ if (streaming)
Re: /* If streaming, reset the TXN so that it is allowed to stream
remaining data. */
I was confused by the exclusiveness of streaming/2PC.
Is it not possible for streaming flags and rbtxn_prepared(txn) true at
the same time?
~
elog(LOG, "stopping decoding of %s (%u)",
txn->gid[0] != '\0'? txn->gid:"", txn->xid);
Is this a safe operation, or do you also need to test txn->gid is not NULL?
;
COMMENT
Line 2606
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
"twophase" --> "two-phase"
;
QUESTION
Line 2655
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
"This is used to handle COMMIT/ABORT PREPARED"
Should that say "COMMIT/ROLLBACK PREPARED"?
;
COMMENT
Line 2668
"Anyways, 2PC transactions" --> "Anyway, two-phase transactions"
;
COMMENT
Line 2765
@@ -2495,7 +2731,13 @@ ReorderBufferAbort(ReorderBuffer *rb,
TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
Remove the blank between the comment and code.
==========
Patch V6-0001, File: src/include/replication/logical.h
==========
COMMENT
Line 89
"two phase" -> "two-phase"
;
COMMENT
Line 89
For consistency with the previous member naming really the new member
should just be called "twophase" rather than "enable_twophase"
;
==========
Patch V6-0001, File: src/include/replication/output_plugin.h
==========
QUESTION
Line 106
As previously asked, why is the callback function/typedef referred as
AbortPrepared instead of RollbackPrepared?
It does not match the SQL and the function comment, and seems only to
add some unnecessary confusion.
;
==========
Patch V6-0001, File: src/include/replication/reorderbuffer.h
==========
QUESTION
Line 116
@@ -162,9 +163,13 @@ typedef struct ReorderBufferChange
#define RBTXN_HAS_CATALOG_CHANGES 0x0001
#define RBTXN_IS_SUBXACT 0x0002
#define RBTXN_IS_SERIALIZED 0x0004
-#define RBTXN_IS_STREAMED 0x0008
-#define RBTXN_HAS_TOAST_INSERT 0x0010
-#define RBTXN_HAS_SPEC_INSERT 0x0020
+#define RBTXN_PREPARE 0x0008
+#define RBTXN_COMMIT_PREPARED 0x0010
+#define RBTXN_ROLLBACK_PREPARED 0x0020
+#define RBTXN_COMMIT 0x0040
+#define RBTXN_IS_STREAMED 0x0080
+#define RBTXN_HAS_TOAST_INSERT 0x0100
+#define RBTXN_HAS_SPEC_INSERT 0x0200
I was wondering why when adding new flags, some of the existing flag
masks were also altered.
I am assuming this is ok because they are never persisted but are only
used in the protocol (??)
;
COMMENT
Line 226
@@ -218,6 +223,15 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+/* was this txn committed in the meanwhile? */
+#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT)
+
Probably all the "txn->txn_flags" here might be more safely written
with parentheses in the macro like "(txn)->txn_flags".
~
Also, Start all comments with capital. And what is the meaning "in the
meanwhile?"
;
COMMENT
Line 410
@@ -390,6 +407,39 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
The format is inconsistent with all other callback signatures here,
where the 1st arg was on the same line as the typedef.
;
COMMENT
Line 440-442
Excessive blank lines following this change?
;
COMMENT
Line 638
@@ -571,6 +631,15 @@ void
ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid,
XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
Not aligned consistently with other function prototypes.
;
==========
Patch V6-0003, File: src/backend/access/transam/twophase.c
==========
COMMENT
Line 551
@@ -548,6 +548,37 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
There is potential to refactor/simplify this code:
e.g.
bool
LookupGXact(const char *gid)
{
int i;
bool found = false;
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
{
GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
/* Ignore not-yet-valid GIDs */
if (gxact->valid && strcmp(gxact->gid, gid) == 0)
{
found = true;
break;
}
}
LWLockRelease(TwoPhaseStateLock);
return found;
}
;
==========
Patch V6-0003, File: src/backend/replication/logical/proto.c
==========
COMMENT
Line 86
@@ -72,12 +72,17 @@ logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data)
*/
void
logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
Since now the flags are used the code comment is wrong.
"/* send the flags field (unused for now) */"
;
COMMENT
Line 129
@ -106,6 +115,77 @@ logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
"2PC transactions" --> "two-phase commit transactions"
;
COMMENT
Line 133
Assert(strlen(txn->gid) > 0);
Shouldn't that assertion also check txn->gid is not NULL (to prevent
NPE in case gid was NULL)
;
COMMENT
Line 177
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
prepare_data->prepare_type = flags;
This code may be OK but it does seem a bit of an abuse of the flags.
e.g. Are they flags or are the really enum values?
e.g. And if they are effectively enums (it appears they are) then
seemed inconsistent that |= was used when they were previously
assigned.
;
==========
Patch V6-0003, File: src/backend/replication/logical/worker.c
==========
COMMENT
Line 757
@@ -749,6 +753,141 @@ apply_handle_commit(StringInfo s)
pgstat_report_activity(STATE_IDLE, NULL);
}
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
Missing function comment to say this is called from apply_handle_prepare.
;
COMMENT
Line 798
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
Missing function comment to say this is called from apply_handle_prepare.
;
COMMENT
Line 824
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
Missing function comment to say this is called from apply_handle_prepare.
==========
Patch V6-0003, File: src/backend/replication/pgoutput/pgoutput.c
==========
COMMENT
Line 50
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
The parameter indentation (2nd lines) does not match everything else
in this context.
;
COMMENT
Line 152
@@ -143,6 +149,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->abort_prepared_cb = pgoutput_abort_prepared_txn;
Remove the unnecessary blank line.
;
QUESTION
Line 386
@@ -373,7 +383,49 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
OutputPluginUpdateProgress(ctx);
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit(ctx->out, txn, commit_lsn, true);
Is the is_commit parameter of logicalrep_write_commit ever passed as false?
If yes, where?
If no, the what is the point of it?
;
COMMENT
Line 408
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Since all this function is identical to pg_output_prepare it might be
better to either
1. just leave this as a wrapper to delegate to that function
2. remove this one entirely and assign the callback to the common
pgoutput_prepare_txn
;
COMMENT
Line 419
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Since all this function is identical to pg_output_prepare if might be
better to either
1. just leave this as a wrapper to delegate to that function
2. remove this one entirely and assign the callback to the common
pgoutput_prepare_tx
;
COMMENT
Line 419
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Shouldn't this comment say be "ROLLBACK PREPARED"?
;
==========
Patch V6-0003, File: src/include/replication/logicalproto.h
==========
QUESTION
Line 101
@@ -87,20 +87,55 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
#define LOGICALREP_IS_ABORT 0x02
Is there a good reason why this is not called:
#define LOGICALREP_IS_ROLLBACK 0x02
;
COMMENT
Line 105
((flags == LOGICALREP_IS_COMMIT) || (flags == LOGICALREP_IS_ABORT))
Macros would be safer if flags are in parentheses
(((flags) == LOGICALREP_IS_COMMIT) || ((flags) == LOGICALREP_IS_ABORT))
;
COMMENT
Line 115
Unexpected whitespace for the typedef
"} LogicalRepPrepareData;"
;
COMMENT
Line 122
/* prepare can be exactly one of PREPARE, [COMMIT|ABORT] PREPARED*/
#define PrepareFlagsAreValid(flags) \
((flags == LOGICALREP_IS_PREPARE) || \
(flags == LOGICALREP_IS_COMMIT_PREPARED) || \
(flags == LOGICALREP_IS_ROLLBACK_PREPARED))
There is confusing mixture in macros and comments of ABORT and ROLLBACK terms
"[COMMIT|ABORT] PREPARED" --> "[COMMIT|ROLLBACK] PREPARED"
~
Also, it would be safer if flags are in parentheses
(((flags) == LOGICALREP_IS_PREPARE) || \
((flags) == LOGICALREP_IS_COMMIT_PREPARED) || \
((flags) == LOGICALREP_IS_ROLLBACK_PREPARED))
;
==========
Patch V6-0003, File: src/test/subscription/t/020_twophase.pl
==========
COMMENT
Line 131 - # check inserts are visible
Isn't this supposed to be checking for rows 12 and 13, instead of 11 and 12?
;
==========
Patch V6-0004, File: contrib/test_decoding/test_decoding.c
==========
COMMENT
Line 81
@@ -78,6 +78,15 @@ static void
pg_decode_stream_stop(LogicalDecodingContext *ctx,
static void pg_decode_stream_abort(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static
All these functions have a 3rd parameter called commit_lsn. Even
though the functions are not commit related. It seems like a cut/paste
error.
;
COMMENT
Line 142
@@ -130,6 +139,9 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_start_cb = pg_decode_stream_start;
cb->stream_stop_cb = pg_decode_stream_stop;
cb->stream_abort_cb = pg_decode_stream_abort;
+ cb->stream_prepare_cb = pg_decode_stream_prepare;
+ cb->stream_commit_prepared_cb = pg_decode_stream_commit_prepared;
+ cb->stream_abort_prepared_cb = pg_decode_stream_abort_prepared;
cb->stream_commit_cb = pg_decode_stream_commit;
Can the "cb->stream_abort_prepared_cb" be changed to
"cb->stream_rollback_prepared_cb"?
;
COMMENT
Line 827
@@ -812,6 +824,78 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}
static void
+pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_pr
The commit_lsn (3rd parameter) is unused and seems like a cut/paste name error.
;
COMMENT
Line 875
+pg_decode_stream_abort_prepared(LogicalDecodingContext *ctx,
The commit_lsn (3rd parameter) is unused and seems like a cut/paste name error.
;
==========
Patch V6-0004, File: doc/src/sgml/logicaldecoding.sgml
==========
COMMENT
48.6.1
@@ -396,6 +396,9 @@ typedef struct OutputPluginCallbacks
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamAbortPreparedCB stream_abort_prepared_cb;
Same question from previous review comments - why using the
terminology "abort" instead of "rollback"
;
COMMENT
48.6.1
@@ -418,7 +421,9 @@ typedef void (*LogicalOutputPluginInit) (struct
OutputPluginCallbacks *cb);
in-progress transactions. The <function>stream_start_cb</function>,
<function>stream_stop_cb</function>, <function>stream_abort_cb</function>,
<function>stream_commit_cb</function> and <function>stream_change_cb</function>
- are required, while <function>stream_message_cb</function> and
+ are required, while <function>stream_message_cb</function>,
+ <function>stream_prepare_cb</function>,
<function>stream_commit_prepared_cb</function>,
+ <function>stream_abort_prepared_cb</function>,
Missing "and".
... "stream_abort_prepared_cb, stream_truncate_cb are optional." -->
"stream_abort_prepared_cb, and stream_truncate_cb are optional."
;
COMMENT
Section 48.6.4.16
Section 48.6.4.17
Section 48.6.4.18
@@ -839,6 +844,45 @@ typedef void (*LogicalDecodeStreamAbortCB)
(struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-stream-prepare">
+ <title>Stream Prepare Callback</title>
+ <para>
+ The <function>stream_prepare_cb</function> callback is called to prepare
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamPrepareCB) (struct
LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-commit-prepared">
+ <title>Stream Commit Prepared Callback</title>
+ <para>
+ The <function>stream_commit_prepared_cb</function> callback is
called to commit prepared
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct
LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-abort-prepared">
+ <title>Stream Abort Prepared Callback</title>
+ <para>
+ The <function>stream_abort_prepared_cb</function> callback is called
to abort prepared
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamAbortPreparedCB) (struct
LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
1. Everywhere it says "two phase" commit should be consistently
replaced to say "two-phase" commit (with the hyphen)
2. Search for "abort_lsn" parameter. It seems to be overused
(cut/paste error) even when the API is unrelated to abort
3. 48.6.4.17 and 48.6.4.18
Is this wording ok? Is the word "prepared" even necessary here?
- "... called to commit prepared a previously streamed transaction ..."
- "... called to abort prepared a previously streamed transaction ..."
;
COMMENT
Section 48.9
@@ -1017,9 +1061,13 @@ OutputPluginWrite(ctx, true);
When streaming an in-progress transaction, the changes (and messages) are
streamed in blocks demarcated by <function>stream_start_cb</function>
and <function>stream_stop_cb</function> callbacks. Once all the decoded
- changes are transmitted, the transaction is committed using the
- <function>stream_commit_cb</function> callback (or possibly aborted using
- the <function>stream_abort_cb</function> callback).
+ changes are transmitted, the transaction can be committed using the
+ the <function>stream_commit_cb</function> callback
"two phase" --> "two-phase"
~
Also, Missing period on end of sentence.
"or aborted using the stream_abort_prepared_cb" --> "or aborted using
the stream_abort_prepared_cb."
;
==========
Patch V6-0004, File: src/backend/replication/logical/logical.c
==========
COMMENT
Line 84
@@ -81,6 +81,12 @@ static void stream_stop_cb_wrapper(ReorderBuffer
*cache, ReorderBufferTXN *txn,
XLogRecPtr last_lsn);
static void stream_abort_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void stream_prepare_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_commit_prepared_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_abort_prepared_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
The 3rd parameter is always "commit_lsn" even for API unrelated to
commit, so seems like cut/paste error.
;
COMMENT
Line 1246
@@ -1231,6 +1243,105 @@ stream_abort_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
}
static void
+stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
Misnamed parameter "commit_lsn" ?
~
Also, Line 1272
There seem to be some missing integrity checking to make sure the
callback is not NULL.
A null callback will give NPE when wrapper attempts to call it
;
COMMENT
Line 1305
+static void
+stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
There seem to be some missing integrity checking to make sure the
callback is not NULL.
A null callback will give NPE when wrapper attempts to call it.
;
COMMENT
Line 1312
+static void
+stream_abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Misnamed parameter "commit_lsn" ?
~
Also, Line 1338
There seem to be some missing integrity checking to make sure the
callback is not NULL.
A null callback will give NPE when wrapper attempts to call it.
==========
Patch V6-0004, File: src/backend/replication/logical/reorderbuffer.c
==========
COMMENT
Line 2684
@@ -2672,15 +2681,31 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb,
TransactionId xid,
txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
strcpy(txn->gid, gid);
- if (is_commit)
+ if (rbtxn_is_streamed(txn))
{
- txn->txn_flags |= RBTXN_COMMIT_PREPARED;
- rb->commit_prepared(rb, txn, commit_lsn);
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
The setting/checking of the flags could be refactored if you wanted to
write less code:
e.g.
if (is_commit)
txn->txn_flags |= RBTXN_COMMIT_PREPARED;
else
txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
if (rbtxn_is_streamed(txn) && rbtxn_commit_prepared(txn))
rb->stream_commit_prepared(rb, txn, commit_lsn);
else if (rbtxn_is_streamed(txn) && rbtxn_rollback_prepared(txn))
rb->stream_abort_prepared(rb, txn, commit_lsn);
else if (rbtxn_commit_prepared(txn))
rb->commit_prepared(rb, txn, commit_lsn);
else if (rbtxn_rollback_prepared(txn))
rb->abort_prepared(rb, txn, commit_lsn);
;
==========
Patch V6-0004, File: src/include/replication/output_plugin.h
==========
COMMENT
Line 171
@@ -157,6 +157,33 @@ typedef void (*LogicalDecodeStreamAbortCB)
(struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);
/*
+ * Called to prepare changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit and only when
+ * two-phased commits are supported
+ */
1. Missing period all these comments.
2. Is the part that says "and only where two-phased commits are
supported" necessary to say? Is seems redundant since comments already
says called as part of a two-phase commit.
;
==========
Patch V6-0004, File: src/include/replication/reorderbuffer.h
==========
COMMENT
Line 467
@@ -466,6 +466,24 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamAbortPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
Cut/paste error - repeated same comment 3 times?
[END]
On Tue, Oct 6, 2020 at 10:23 AM Peter.B.Smith@fujitsu.com
<Peter.B.Smith@fujitsu.com> wrote:
[BEGIN]
==========
Patch V6-0001, File: contrib/test_decoding/expected/prepared.out (so
prepared.sql also)
==========COMMENT
Line 30 - The INSERT INTO test_prepared1 VALUES (2); is kind of
strange because it is not really part of the prior test nor the
following test. Maybe it would be better to have a comment describing
the purpose of this isolated INSERT and to also consume the data from
the slot so it does not get jumbled with the data of the following
(abort) test.;
COMMENT
Line 53 - Same comment for this test INSERT INTO test_prepared1 VALUES
(4); It kind of has nothing really to do with either the prior (abort)
test nor the following (ddl) test.;
COMMENT
Line 60 - Seems to check which locks are held for the test_prepared_1
table while the transaction is in progress. Maybe it would be better
to have more comments describing what is expected here and why.;
COMMENT
Line 88 - There is a comment in the test saying "-- We should see '7'
before '5' in our results since it commits first." but I did not see
any test code that actually verifies that happens.;
All the above comments are genuine and I think it is mostly because
the author has blindly modified the existing tests without completely
understanding the intent of the test. I suggest we write a completely
new regression file (decode_prepared.sql) for these and just copy
whatever is required from prepared.sql. Once we do that we might also
want to rename existing prepared.sql to decode_commit_prepared.sql or
something like that. I think modifying the existing test appears to be
quite ugly and also it is changing the intent of the existing tests.
QUESTION
Line 120 - I did not really understand the SQL checking the pg_class.
I expected this would be checking table 'test_prepared1' instead. Can
you explain it?
SELECT 'pg_class' AS relation, locktype, mode
FROM pg_locks
WHERE locktype = 'relation'
AND relation = 'pg_class'::regclass;
relation | locktype | mode
----------+----------+------
(0 rows);
Yes, I also think your expectation is correct and this should be on
'test_prepared_1'.
QUESTION
Line 139 - SET statement_timeout = '1s'; is 1 seconds short enough
here for this test, or might it be that these statements would be
completed in less than one seconds anyhow?;
Good question. I think we have to mention the reason why logical
decoding is not blocked while it needs to acquire a shared lock on the
table and the previous commands already held an exclusive lock on the
table. I am not sure if I am missing something but like you, it is not
clear to me as well what this test intends to do, so surely more
commentary is required.
QUESTION
Line 163 - How is this testing a SAVEPOINT? Or is it only to check
that the SAVEPOINT command is not part of the replicated changes?;
It is more of testing that subtransactions will not create a problem
while decoding.
COMMENT
Line 175 - Missing underscore in comment. Code requires also underscore:
"nodecode" --> "_nodecode"
makes sense.
==========
Patch V6-0001, File: contrib/test_decoding/test_decoding.c
==========COMMENT Line 43 @@ -36,6 +40,7 @@ typedef struct bool skip_empty_xacts; bool xact_wrote_changes; bool only_local; + TransactionId check_xid; /* track abort of this txid */ } TestDecodingData;The "check_xid" seems a meaningless name. Check what?
IIUC maybe should be something like "check_xid_aborted";
COMMENT Line 105 @ -88,6 +93,19 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, int nrelations, Relation relations[], ReorderBufferChange *change); +static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx, + ReorderBufferTXN *txn,Remove extra blank line after these functions
;
The above two sounds reasonable suggestions.
COMMENT Line 149 @@ -116,6 +134,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb) cb->stream_change_cb = pg_decode_stream_change; cb->stream_message_cb = pg_decode_stream_message; cb->stream_truncate_cb = pg_decode_stream_truncate; + cb->filter_prepare_cb = pg_decode_filter_prepare; + cb->prepare_cb = pg_decode_prepare_txn; + cb->commit_prepared_cb = pg_decode_commit_prepared_txn; + cb->abort_prepared_cb = pg_decode_abort_prepared_txn; + }There is a confusing mix of terminology where sometimes things are
referred as ROLLBACK/rollback and other times apparently the same
operation is referred as ABORT/abort. I do not know the root cause of
this mixture. IIUC maybe the internal functions and protocol generally
use the term "abort", whereas the SQL syntax is "ROLLBACK"... but
where those two terms collide in the middle it gets quite confusing.At least I thought the names of the "callbacks" which get exposed to
the user (e.g. in the help) might be better if they would match the
SQL.
"abort_prepared_cb" --> "rollback_prepared_db"
This suggestion sounds reasonable. I think it is to entertain the case
where due to error we need to rollback the transaction. I think it is
better if use 'rollback' terminology in the exposed functions. We
already have a function with the name stream_abort_cb in the code
which we also might want to rename but that is a separate thing and we
can deal it with a separate patch.
There are similar review comments like this below where the
alternating terms caused me some confusion.~
Also, Remove the extra blank line before the end of the function.
;
COMMENT Line 267 @ -227,6 +252,42 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, errmsg("could not parse value \"%s\" for parameter \"%s\"", strVal(elem->arg), elem->defname))); } + else if (strcmp(elem->defname, "two-phase-commit") == 0) + { + if (elem->arg == NULL) + continue;IMO the "check-xid" code might be better rearranged so the NULL check
is first instead of if/else.
e.g.
if (elem->arg == NULL)
ereport(FATAL,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("check-xid needs an input value")));
~Also, is it really supposed to be FATAL instead or ERROR. That is not
the same as the other surrounding code.;
+1.
COMMENT
Line 296
if (data->check_xid <= 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("Specify positive value for parameter \"%s\","
" you specified \"%s\"",
elem->defname, strVal(elem->arg))));The code checking for <= 0 seems over-complicated. Because conversion
was using strtoul() I fail to see how this can ever be < 0. Wouldn't
it be easier to simply test the result of the strtoul() function?BEFORE: if (errno == EINVAL || errno == ERANGE)
AFTER: if (data->check_xid == 0)
Better to use TransactionIdIsValid(data->check_xid) here.
~
Also, should this be FATAL? Everything else similar is ERROR.
;
It should be an error.
COMMENT
(general)
I don't recall seeing any of these decoding options (e.g.
"two-phase-commit", "check-xid") documented anywhere.
So how can a user even know these options exist so they can use them?
Perhaps options should be described on this page?
https://www.postgresql.org/docs/13/functions-admin.html#FUNCTIONS-REPLICATION;
I think we should do what we are doing for other options, if they are
not documented then why to document this one separately. I guess we
can make a case to document all the existing options and write a
separate patch for that.
COMMENT
(general)
"check-xid" is a meaningless option name. Maybe something like
"checked-xid-aborted" is more useful?
Suggest changing the member, the option, and the error messages to
match some better name.;
COMMENT
Line 314
@@ -238,6 +299,7 @@ pg_decode_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
}ctx->streaming &= enable_streaming;
+ ctx->enable_twophase &= enable_2pc;
}The "ctx->enable_twophase" is inconsistent naming with the
"ctx->streaming" member.
"enable_twophase" --> "twophase";
+1.
COMMENT
Line 374
@@ -297,6 +359,94 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}+ +/* + * Filter out two-phase transactions. + * + * Each plugin can implement its own filtering logic. Here + * we demonstrate a simple logic by checking the GID. If the + * GID contains the "_nodecode" substring, then we filter + * it out. + */ +static bool +pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,Remove the extra preceding blank line.
~
I did not find anything in the help about "_nodecode". Should it be
there or is this deliberately not documented feature?;
I guess we can document it along with filter_prepare API, if not
already documented.
QUESTION
Line 440
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,Is this a wrong comment
"ABORT PREPARED" --> "ROLLBACK PREPARED" ??;
COMMENT
Line 620
@@ -455,6 +605,22 @@ pg_decode_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;+ /* if check_xid is specified */ + if (TransactionIdIsValid(data->check_xid)) + { + elog(LOG, "waiting for %u to abort", data->check_xid); + while (TransactionIdIsInProgress(datThe check_xid seems a meaningless name, and the comment "/* if
check_xid is specified */" was not helpful either.
IIUC purpose of this is to check that the nominated xid always is rolled back.
So the appropriate name may be more like "check-xid-aborted".;
Yeah, this part deserves better comments.
--
With Regards,
Amit Kapila.
Import Notes
Reply to msg id not found: ME1PR01MB12527821009E374578C91943A80D0@ME1PR01MB1252.ausprd01.prod.outlook.com
On Tue, Oct 6, 2020 at 10:23 AM Peter.B.Smith@fujitsu.com
<Peter.B.Smith@fujitsu.com> wrote:
==========
Patch V6-0001, File: doc/src/sgml/logicaldecoding.sgml
==========COMMENT/QUESTION Section 48.6.1 @ -387,6 +387,10 @@ typedef struct OutputPluginCallbacks LogicalDecodeTruncateCB truncate_cb; LogicalDecodeCommitCB commit_cb; LogicalDecodeMessageCB message_cb; + LogicalDecodeFilterPrepareCB filter_prepare_cb;Confused by the mixing of terminologies "abort" and "rollback".
Why is it LogicalDecodeAbortPreparedCB instead of
LogicalDecodeRollbackPreparedCB?
Why is it abort_prepared_cb instead of rollback_prepared_cb;?I thought everything the user sees should be ROLLBACK/rollback (like
the SQL) regardless of what the internal functions might be called.;
Fair enough.
COMMENT
Section 48.6.1
The begin_cb, change_cb and commit_cb callbacks are required, while
startup_cb, filter_by_origin_cb, truncate_cb, and shutdown_cb are
optional. If truncate_cb is not set but a TRUNCATE is to be decoded,
the action will be ignored.The 1st paragraph beneath the typedef does not mention the newly added
callbacks to say if they are required or optional.;
Yeah, in code comments it was mentioned but is missed here, see the
comment "To support two phase logical decoding, we require
prepare/commit-prepare/abort-prepare callbacks. The filter-prepare
callback is optional.". I think instead of directly editing the above
paragraph we can write a new one similar to what we have done for
streaming of large in-progress transactions (Refer <para> An output
plugin may also define functions to support streaming of large,
in-progress transactions.).
COMMENT
Section 48.6.4.5
Section 48.6.4.6
Section 48.6.4.7
@@ -578,6 +588,55 @@ typedef void (*LogicalDecodeCommitCB) (struct
LogicalDecodingContext *ctx,
</para>
</sect3>+ <sect3 id="logicaldecoding-output-plugin-prepare"> + <sect3 id="logicaldecoding-output-plugin-commit-prepared"> + <sect3 id="logicaldecoding-output-plugin-abort-prepared"> +<programlisting>The wording and titles are a bit backwards compared to the others.
e.g. previously was "Transaction Begin" (not "Begin Transaction") and
"Transaction End" (not "End Transaction").So for consistently following the existing IMO should change these new
titles (and wording) to:
- "Commit Prepared Transaction Callback" --> "Transaction Commit
Prepared Callback"
- "Rollback Prepared Transaction Callback" --> "Transaction Rollback
Prepared Callback"
makes sense.
- "whenever a commit prepared transaction has been decoded" -->
"whenever a transaction commit prepared has been decoded"
- "whenever a rollback prepared transaction has been decoded." -->
"whenever a transaction rollback prepared has been decoded.";
I don't find above suggestions much better than current wording. How
about below instead?
"whenever we decode a transaction which is prepared for two-phase
commit is committed"
"whenever we decode a transaction which is prepared for two-phase
commit is rolled back"
Also, related to this:
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Commit Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback
is called whenever
+ a commit prepared transaction has been decoded. The
<parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can
be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct
LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-abort-prepared">
+ <title>Rollback Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>abort_prepared_cb</function> callback is
called whenever
+ a rollback prepared transaction has been decoded. The
<parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can
be used in this
+ callback.
+<programlisting>
Both the above are not optional as per code and I think code is
correct. I think the documentation is wrong here.
==========
Patch V6-0001, File: src/backend/replication/logical/decode.c
==========COMMENT Line 74 @@ -70,6 +70,9 @@ static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, xl_xact_parsed_commit *parsed, TransactionId xid); static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, xl_xact_parsed_abort *parsed, TransactionId xid); +static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, + xl_xact_parsed_prepare * parsed);The 2nd line of DecodePrepare is misaligned by one space.
;
Yeah, probably pgindent is the answer. Ajin, can you please run
pgindent on all the patches?
COMMENT Line 321 @@ -312,17 +315,34 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) } break; case XLOG_XACT_PREPARE: + { + xl_xact_parsed_prepare parsed; + xl_xact_prepare *xlrec; + /* check that output plugin is capable of twophase decoding */"twophase" --> "two-phase"
~
Also, add a blank line after the declarations.
;
==========
Patch V6-0001, File: src/backend/replication/logical/logical.c
==========COMMENT
Line 249
@@ -225,6 +237,19 @@ StartupDecodingContext(List *output_plugin_options,
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);+ /* + * To support two phase logical decoding, we require prepare/commit-prepare/abort-prepare + * callbacks. The filter-prepare callback is optional. We however enable two phase logical + * decoding when at least one of the methods is enabled so that we can easily identify + * missing methods.The terminology is generally well known as "two-phase" (with the
hyphen) https://en.wikipedia.org/wiki/Two-phase_commit_protocol so
let's be consistent for all the patch code comments. Please search the
code and correct this in all places, even where I might have missed to
identify it."two phase" --> "two-phase"
;
COMMENT
Line 822
@@ -782,6 +807,111 @@ commit_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
}static void +prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, + XLogRecPtr prepare_lsn)"support 2 phase" --> "supports two-phase" in the comment
;
COMMENT
Line 844
Code condition seems strange and/or broken.
if (ctx->enable_twophase && ctx->callbacks.prepare_cb == NULL)
Because if the flag is null then this condition is skipped.
But then if the callback was also NULL then attempting to call it to
"do the actual work" will give NPE.~
Also, I wonder should this check be the first thing in this function?
Because if it fails does it even make sense that all the errcallback
code was set up?> E.g errcallback.arg potentially is left pointing to a stack variable
on a stack that no longer exists.;
Right, I think we should have an Assert(ctx->enable_twophase) in the
beginning and then have the check (ctx->callbacks.prepare_cb == NULL)
t its current place. Refer any of the streaming APIs (for ex.
stream_stop_cb_wrapper).
COMMENT
Line 857
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,"support 2 phase" --> "supports two-phase" in the comment
~
Also, Same potential trouble with the condition:
if (ctx->enable_twophase && ctx->callbacks.commit_prepared_cb == NULL)
Same as previously asked. Should this check be first thing in this function?;
Yeah, so the same solution as mentioned above can be used.
COMMENT
Line 892
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,"support 2 phase" --> "supports two-phase" in the comment
~
Same potential trouble with the condition:
if (ctx->enable_twophase && ctx->callbacks.abort_prepared_cb == NULL)
Same as previously asked. Should this check be the first thing in this function?;
Again the same solution can be used.
COMMENT
Line 1013
@@ -858,6 +988,51 @@ truncate_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}+static bool +filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, + TransactionId xid, const char *gid)Fix wording in comment:
"twophase" --> "two-phase transactions"
"twophase transactions" --> "two-phase transactions"==========
Patch V6-0001, File: src/backend/replication/logical/reorderbuffer.c
==========COMMENT Line 255 @@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn, char *change); static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn); -static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn); +static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, + bool txn_prepared);The alignment is inconsistent. One more space needed before "bool txn_prepared"
;
COMMENT
Line 417
@@ -413,6 +414,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
}/* free data that's contained */ + if (txn->gid != NULL) + { + pfree(txn->gid); + txn->gid = NULL; + }Should add the blank link before this new code, as it was before.
;
COMMENT
Line 1564
@ -1502,12 +1561,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
}/* - * Discard changes from a transaction (and subtransactions), after streaming - * them. Keep the remaining info - transactions, tuplecids, invalidations and - * snapshots. + * Discard changes from a transaction (and subtransactions), either after streaming or + * after a PREPARE.typo "snapshots.If" -> "snapshots. If"
;
COMMENT/QUESTION
Line 1590
@@ -1526,7 +1587,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);- ReorderBufferTruncateTXN(rb, subtxn); + ReorderBufferTruncateTXN(rb, subtxn, txn_prepared); }There are some code paths here I did not understand how they match the comments.
Because this function is recursive it seems that it may be called
where the 2nd parameter txn is a sub-transaction.But then this seems at odds with some of the other code comments of
this function which are processing the txn without ever testing is it
really toplevel or not:e.g. Line 1593 "/* cleanup changes in the toplevel txn */"
I think this comment is wrong but this is not the fault of this patch.
e.g. Line 1632 "They are always stored in the toplevel transaction."
;
This seems to be correct and we probably need an Assert that the
transaction is a top-level transaction.
COMMENT Line 1644 @@ -1560,9 +1621,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn) * about the toplevel xact (we send the XID in all messages), but we never * stream XIDs of empty subxacts. */ - if ((!txn->toptxn) || (txn->nentries_mem != 0)) + if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0))) txn->txn_flags |= RBTXN_IS_STREAMED;+ if (txn_prepared)
/* remove the change from it's containing list */
typo "it's" --> "its";
QUESTION Line 1977 @@ -1880,7 +1965,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, ReorderBufferChange *specinsert) { /* Discard the changes that we just streamed */ - ReorderBufferTruncateTXN(rb, txn); + ReorderBufferTruncateTXN(rb, txn, false);How do you know the 3rd parameter - i.e. txn_prepared - should be
hardwired false here?
e.g. I thought that maybe rbtxn_prepared(txn) can be true here.;
COMMENT
Line 2345
@@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
break;
}
}
-
/*Looks like accidental blank line deletion. This should be put back how it was
;
COMMENT/QUESTION Line 2374 @@ -2278,7 +2362,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, } } else - rb->commit(rb, txn, commit_lsn); + { + /* + * Call either PREPARE (for twophase transactions) or COMMIT + * (for regular ones)."twophase" --> "two-phase"
~
Also, I was confused by the apparent assumption of exclusiveness of
streaming and 2PC...
e.g. what if streaming AND 2PC then it won't do rb->prepare();
QUESTION Line 2424 @@ -2319,11 +2412,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, */ if (streaming) { - ReorderBufferTruncateTXN(rb, txn); + ReorderBufferTruncateTXN(rb, txn, false);/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))I was confused by the exclusiveness of streaming/2PC.
e.g. what if streaming AND 2PC at same time - how can you pass false
as 3rd param to ReorderBufferTruncateTXN?;
Yeah, this and another handling wherever it is assumed that both can't
be true together is wrong.
COMMENT
Line 2463
@@ -2352,17 +2451,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,/* * The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent - * abort of the (sub)transaction we are streaming. We need to do the + * abort of the (sub)transaction we are streaming or preparing. We need to do the * cleanup and return gracefully on this error, see SetupCheckXidLive. */"twoi phase" --> "two-phase"
;
QUESTIONS
Line 2482
@@ -2370,10 +2470,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;- /* Reset the TXN so that it is allowed to stream remaining data. */ - ReorderBufferResetTXN(rb, txn, snapshot_now, - command_id, prev_lsn, - specinsert); + /* If streaming, reset the TXN so that it is allowed to stream remaining data. */ + if (streaming)Re: /* If streaming, reset the TXN so that it is allowed to stream
remaining data. */
I was confused by the exclusiveness of streaming/2PC.
Is it not possible for streaming flags and rbtxn_prepared(txn) true at
the same time?
Yeah, I think it is not correct to assume that both can't be true at
the same time. But when prepared is true irrespective of whether
streaming is true or not we can use ReorderBufferTruncateTXN() API
instead of Reset API.
~
elog(LOG, "stopping decoding of %s (%u)",
txn->gid[0] != '\0'? txn->gid:"", txn->xid);Is this a safe operation, or do you also need to test txn->gid is not NULL?
;
I think if 'prepared' is true then we can assume it to be non-NULL,
otherwise, not.
I am responding to your email in phases so that we can have a
discussion on specific points if required and I am slightly afraid
that the email might not bounce as it happened in your case when you
sent such a long email.
--
With Regards,
Amit Kapila.
Import Notes
Reply to msg id not found: ME1PR01MB12527821009E374578C91943A80D0@ME1PR01MB1252.ausprd01.prod.outlook.com
On Wed, Oct 7, 2020 at 1:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
There is a confusing mix of terminology where sometimes things are
referred as ROLLBACK/rollback and other times apparently the same
operation is referred as ABORT/abort. I do not know the root cause of
this mixture. IIUC maybe the internal functions and protocol generally
use the term "abort", whereas the SQL syntax is "ROLLBACK"... but
where those two terms collide in the middle it gets quite confusing.At least I thought the names of the "callbacks" which get exposed to
the user (e.g. in the help) might be better if they would match the
SQL.
"abort_prepared_cb" --> "rollback_prepared_db"This suggestion sounds reasonable. I think it is to entertain the case
where due to error we need to rollback the transaction. I think it is
better if use 'rollback' terminology in the exposed functions. We
already have a function with the name stream_abort_cb in the code
which we also might want to rename but that is a separate thing and we
can deal it with a separate patch.
So, for an ordinary transaction, rollback implies an explicit user
action, but an abort could either be an explicit user action (ABORT;
or ROLLBACK;) or an error. I agree that calling that case "abort"
rather than "rollback" is better. However, the situation is a bit
different for a prepared transaction: no error can prevent such a
transaction from being committed. That is the whole point of being
able to prepare transactions. So it is not unreasonable to think of
use "rollback" rather than "abort" for prepared transactions, but I
think it would be wrong in other cases. On the other hand, using
"abort" for all the cases also doesn't seem bad to me. It's true that
there is no ABORT PREPARED command at the SQL level, but I don't think
that is very important. I don't feel wrong saying that ROLLBACK
PREPARED causes a transaction abort.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Oct 8, 2020 at 6:14 AM Robert Haas <robertmhaas@gmail.com> wrote:
So, for an ordinary transaction, rollback implies an explicit user
action, but an abort could either be an explicit user action (ABORT;
or ROLLBACK;) or an error. I agree that calling that case "abort"
rather than "rollback" is better. However, the situation is a bit
different for a prepared transaction: no error can prevent such a
transaction from being committed. That is the whole point of being
able to prepare transactions. So it is not unreasonable to think of
use "rollback" rather than "abort" for prepared transactions, but I
think it would be wrong in other cases. On the other hand, using
"abort" for all the cases also doesn't seem bad to me. It's true that
there is no ABORT PREPARED command at the SQL level, but I don't think
that is very important. I don't feel wrong saying that ROLLBACK
PREPARED causes a transaction abort.
So, as I understand you don't object to renaming the callback APIs for
ROLLBACK PREPARED transactions to "rollback_prepared_cb" but keeping
the "stream_abort" as such. This was what I was planning on doing.
I was just writing this up, so wanted to confirm.
regards,
Ajin Cherian
Fujitsu Australia
On Tue, Oct 6, 2020 at 10:23 AM Peter.B.Smith@fujitsu.com
<Peter.B.Smith@fujitsu.com> wrote:
==========
Patch V6-0001, File: src/include/replication/reorderbuffer.h
==========QUESTION Line 116 @@ -162,9 +163,13 @@ typedef struct ReorderBufferChange #define RBTXN_HAS_CATALOG_CHANGES 0x0001 #define RBTXN_IS_SUBXACT 0x0002 #define RBTXN_IS_SERIALIZED 0x0004 -#define RBTXN_IS_STREAMED 0x0008 -#define RBTXN_HAS_TOAST_INSERT 0x0010 -#define RBTXN_HAS_SPEC_INSERT 0x0020 +#define RBTXN_PREPARE 0x0008 +#define RBTXN_COMMIT_PREPARED 0x0010 +#define RBTXN_ROLLBACK_PREPARED 0x0020 +#define RBTXN_COMMIT 0x0040 +#define RBTXN_IS_STREAMED 0x0080 +#define RBTXN_HAS_TOAST_INSERT 0x0100 +#define RBTXN_HAS_SPEC_INSERT 0x0200I was wondering why when adding new flags, some of the existing flag
masks were also altered.
I am assuming this is ok because they are never persisted but are only
used in the protocol (??);
This is bad even though there is no direct problem. I don't think we
need to change the existing ones, we can add the new ones at the end
with the number starting where the last one ends.
COMMENT
Line 133Assert(strlen(txn->gid) > 0);
Shouldn't that assertion also check txn->gid is not NULL (to prevent
NPE in case gid was NULL);
I think that would be better and a stronger Assertion than the current one.
COMMENT
Line 177
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)prepare_data->prepare_type = flags;
This code may be OK but it does seem a bit of an abuse of the flags.e.g. Are they flags or are the really enum values?
e.g. And if they are effectively enums (it appears they are) then
seemed inconsistent that |= was used when they were previously
assigned.;
I don't understand this point. As far as I can see at the time of
write (logicalrep_write_prepare()), the patch has used |=, and at the
time of reading (logicalrep_read_prepare()) it has used assignment
which seems correct from the code perspective. Do you have a better
proposal?
COMMENT
Line 408
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,Since all this function is identical to pg_output_prepare it might be
better to either
1. just leave this as a wrapper to delegate to that function
2. remove this one entirely and assign the callback to the common
pgoutput_prepare_txn;
I think this is because as of now the patch uses the same function and
protocol message to send both Prepare and Commit/Rollback Prepare
which I am not sure is the right thing. I suggest keeping that code as
it is for now. Let's first try to figure out if it is a good idea to
overload the same protocol message and use flags to distinguish the
actual message. Also, I don't know whether prepare_lsn is required
during commit time?
COMMENT
Line 419
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,Since all this function is identical to pg_output_prepare if might be
better to either
1. just leave this as a wrapper to delegate to that function
2. remove this one entirely and assign the callback to the common
pgoutput_prepare_tx;
Due to reasons mentioned for the previous comment, let's keep this
also as it is for now.
--
With Regards,
Amit Kapila.
Import Notes
Reply to msg id not found: ME1PR01MB12527821009E374578C91943A80D0@ME1PR01MB1252.ausprd01.prod.outlook.com
On Thu, Oct 8, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
COMMENT
Line 177
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)prepare_data->prepare_type = flags;
This code may be OK but it does seem a bit of an abuse of the flags.e.g. Are they flags or are the really enum values?
e.g. And if they are effectively enums (it appears they are) then
seemed inconsistent that |= was used when they were previously
assigned.;
I don't understand this point. As far as I can see at the time of
write (logicalrep_write_prepare()), the patch has used |=, and at the
time of reading (logicalrep_read_prepare()) it has used assignment
which seems correct from the code perspective. Do you have a better
proposal?
OK. I will explain my thinking when I wrote that review comment.
I agree all is "correct" from a code perspective.
But IMO using bit arithmetic implies that different combinations are
also possible, whereas in current code they are not.
So code is kind of having a bet each-way - sometimes treating "flags"
as bit flags and sometimes as enums.
e.g. If these flags are not really bit flags at all then the
logicalrep_write_prepare() code might just as well be written as
below:
BEFORE
if (rbtxn_commit_prepared(txn))
flags |= LOGICALREP_IS_COMMIT_PREPARED;
else if (rbtxn_rollback_prepared(txn))
flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
else
flags |= LOGICALREP_IS_PREPARE;
/* Make sure exactly one of the expected flags is set. */
if (!PrepareFlagsAreValid(flags))
elog(ERROR, "unrecognized flags %u in prepare message", flags);
AFTER
if (rbtxn_commit_prepared(txn))
flags = LOGICALREP_IS_COMMIT_PREPARED;
else if (rbtxn_rollback_prepared(txn))
flags = LOGICALREP_IS_ROLLBACK_PREPARED;
else
flags = LOGICALREP_IS_PREPARE;
~
OTOH, if you really do want to anticipate having future flag bit
combinations then maybe the PrepareFlagsAreValid() macro ought to to
be tweaked accordingly, and the logicalrep_read_prepare() code maybe
should look more like below:
BEFORE
/* set the action (reuse the constants used for the flags) */
prepare_data->prepare_type = flags;
AFTER
/* set the action (reuse the constants used for the flags) */
prepare_data->prepare_type =
flags & LOGICALREP_IS_COMMIT_PREPARED ? LOGICALREP_IS_COMMIT_PREPARED :
flags & LOGICALREP_IS_ROLLBACK_PREPARED ? LOGICALREP_IS_ROLLBACK_PREPARED :
LOGICALREP_IS_PREPARE;
Kind Regards.
Peter Smith
Fujitsu Australia
On Fri, Oct 9, 2020 at 5:45 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Thu, Oct 8, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
COMMENT
Line 177
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)prepare_data->prepare_type = flags;
This code may be OK but it does seem a bit of an abuse of the flags.e.g. Are they flags or are the really enum values?
e.g. And if they are effectively enums (it appears they are) then
seemed inconsistent that |= was used when they were previously
assigned.;
I don't understand this point. As far as I can see at the time of
write (logicalrep_write_prepare()), the patch has used |=, and at the
time of reading (logicalrep_read_prepare()) it has used assignment
which seems correct from the code perspective. Do you have a better
proposal?OK. I will explain my thinking when I wrote that review comment.
I agree all is "correct" from a code perspective.
But IMO using bit arithmetic implies that different combinations are
also possible, whereas in current code they are not.
So code is kind of having a bet each-way - sometimes treating "flags"
as bit flags and sometimes as enums.e.g. If these flags are not really bit flags at all then the
logicalrep_write_prepare() code might just as well be written as
below:BEFORE
if (rbtxn_commit_prepared(txn))
flags |= LOGICALREP_IS_COMMIT_PREPARED;
else if (rbtxn_rollback_prepared(txn))
flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
else
flags |= LOGICALREP_IS_PREPARE;/* Make sure exactly one of the expected flags is set. */
if (!PrepareFlagsAreValid(flags))
elog(ERROR, "unrecognized flags %u in prepare message", flags);AFTER
if (rbtxn_commit_prepared(txn))
flags = LOGICALREP_IS_COMMIT_PREPARED;
else if (rbtxn_rollback_prepared(txn))
flags = LOGICALREP_IS_ROLLBACK_PREPARED;
else
flags = LOGICALREP_IS_PREPARE;~
OTOH, if you really do want to anticipate having future flag bit
combinations
I don't anticipate more combinations rather I am not yet sure whether
we want to distinguish these operations with flags or have separate
messages for each of these operations. I think for now we can go with
your proposal above.
--
With Regards,
Amit Kapila.
On Wed, Oct 7, 2020 at 4:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
All the above comments are genuine and I think it is mostly because
the author has blindly modified the existing tests without completely
understanding the intent of the test. I suggest we write a completely
new regression file (decode_prepared.sql) for these and just copy
whatever is required from prepared.sql. Once we do that we might also
want to rename existing prepared.sql to decode_commit_prepared.sql or
something like that. I think modifying the existing test appears to be
quite ugly and also it is changing the intent of the existing tests.
Updated this. Kept the original prepared.sql untouched and added a new
regression file called two_phase.sql
which is specific to test cases with the new flag two-phase-commit.
QUESTION
Line 120 - I did not really understand the SQL checking the pg_class.
I expected this would be checking table 'test_prepared1' instead. Can
you explain it?
SELECT 'pg_class' AS relation, locktype, mode
FROM pg_locks
WHERE locktype = 'relation'
AND relation = 'pg_class'::regclass;
relation | locktype | mode
----------+----------+------
(0 rows);
Yes, I also think your expectation is correct and this should be on
'test_prepared_1'.
Updated
QUESTION
Line 139 - SET statement_timeout = '1s'; is 1 seconds short enough
here for this test, or might it be that these statements would be
completed in less than one seconds anyhow?;
Good question. I think we have to mention the reason why logical
decoding is not blocked while it needs to acquire a shared lock on the
table and the previous commands already held an exclusive lock on the
table. I am not sure if I am missing something but like you, it is not
clear to me as well what this test intends to do, so surely more
commentary is required.
Updated.
QUESTION
Line 163 - How is this testing a SAVEPOINT? Or is it only to check
that the SAVEPOINT command is not part of the replicated changes?;
It is more of testing that subtransactions will not create a problem
while decoding.
Updated with a testcase that actually does a rollback to a savepoint
COMMENT
Line 175 - Missing underscore in comment. Code requires also underscore:
"nodecode" --> "_nodecode"makes sense.
Updated.
==========
Patch V6-0001, File: contrib/test_decoding/test_decoding.c
==========COMMENT Line 43 @@ -36,6 +40,7 @@ typedef struct bool skip_empty_xacts; bool xact_wrote_changes; bool only_local; + TransactionId check_xid; /* track abort of this txid */ } TestDecodingData;The "check_xid" seems a meaningless name. Check what?
IIUC maybe should be something like "check_xid_aborted"
Updated.
;
COMMENT Line 105 @ -88,6 +93,19 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, int nrelations, Relation relations[], ReorderBufferChange *change); +static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx, + ReorderBufferTXN *txn,Remove extra blank line after these functions
;
The above two sounds reasonable suggestions.
Updated.
COMMENT Line 149 @@ -116,6 +134,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb) cb->stream_change_cb = pg_decode_stream_change; cb->stream_message_cb = pg_decode_stream_message; cb->stream_truncate_cb = pg_decode_stream_truncate; + cb->filter_prepare_cb = pg_decode_filter_prepare; + cb->prepare_cb = pg_decode_prepare_txn; + cb->commit_prepared_cb = pg_decode_commit_prepared_txn; + cb->abort_prepared_cb = pg_decode_abort_prepared_txn; + }There is a confusing mix of terminology where sometimes things are
referred as ROLLBACK/rollback and other times apparently the same
operation is referred as ABORT/abort. I do not know the root cause of
this mixture. IIUC maybe the internal functions and protocol generally
use the term "abort", whereas the SQL syntax is "ROLLBACK"... but
where those two terms collide in the middle it gets quite confusing.At least I thought the names of the "callbacks" which get exposed to
the user (e.g. in the help) might be better if they would match the
SQL.
"abort_prepared_cb" --> "rollback_prepared_db"This suggestion sounds reasonable. I think it is to entertain the case
where due to error we need to rollback the transaction. I think it is
better if use 'rollback' terminology in the exposed functions. We
already have a function with the name stream_abort_cb in the code
which we also might want to rename but that is a separate thing and we
can deal it with a separate patch.
Changed the call back names from abort_prepared to rollback_prepapred
and stream_abort_prepared to stream_rollback_prepared.
There are similar review comments like this below where the
alternating terms caused me some confusion.~
Also, Remove the extra blank line before the end of the function.
;
COMMENT Line 267 @ -227,6 +252,42 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, errmsg("could not parse value \"%s\" for parameter \"%s\"", strVal(elem->arg), elem->defname))); } + else if (strcmp(elem->defname, "two-phase-commit") == 0) + { + if (elem->arg == NULL) + continue;IMO the "check-xid" code might be better rearranged so the NULL check
is first instead of if/else.
e.g.
if (elem->arg == NULL)
ereport(FATAL,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("check-xid needs an input value")));
~Also, is it really supposed to be FATAL instead or ERROR. That is not
the same as the other surrounding code.;
+1.
Updated.
COMMENT
Line 296
if (data->check_xid <= 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("Specify positive value for parameter \"%s\","
" you specified \"%s\"",
elem->defname, strVal(elem->arg))));The code checking for <= 0 seems over-complicated. Because conversion
was using strtoul() I fail to see how this can ever be < 0. Wouldn't
it be easier to simply test the result of the strtoul() function?BEFORE: if (errno == EINVAL || errno == ERANGE)
AFTER: if (data->check_xid == 0)Better to use TransactionIdIsValid(data->check_xid) here.
Updated.
~
Also, should this be FATAL? Everything else similar is ERROR.
;
It should be an error.
Updated
COMMENT
(general)
I don't recall seeing any of these decoding options (e.g.
"two-phase-commit", "check-xid") documented anywhere.
So how can a user even know these options exist so they can use them?
Perhaps options should be described on this page?
https://www.postgresql.org/docs/13/functions-admin.html#FUNCTIONS-REPLICATION;
I think we should do what we are doing for other options, if they are
not documented then why to document this one separately. I guess we
can make a case to document all the existing options and write a
separate patch for that.
I didnt see any of the test_decoding options updated in the
documentation as these seem specific for the test_decoder used in
testing.
https://www.postgresql.org/docs/13/test-decoding.html
COMMENT
(general)
"check-xid" is a meaningless option name. Maybe something like
"checked-xid-aborted" is more useful?
Suggest changing the member, the option, and the error messages to
match some better name.
Updated.
;
COMMENT
Line 314
@@ -238,6 +299,7 @@ pg_decode_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
}ctx->streaming &= enable_streaming;
+ ctx->enable_twophase &= enable_2pc;
}The "ctx->enable_twophase" is inconsistent naming with the
"ctx->streaming" member.
"enable_twophase" --> "twophase";
+1.
Updated
COMMENT
Line 374
@@ -297,6 +359,94 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}+ +/* + * Filter out two-phase transactions. + * + * Each plugin can implement its own filtering logic. Here + * we demonstrate a simple logic by checking the GID. If the + * GID contains the "_nodecode" substring, then we filter + * it out. + */ +static bool +pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,Remove the extra preceding blank line.
Updated.
~
I did not find anything in the help about "_nodecode". Should it be
there or is this deliberately not documented feature?;
I guess we can document it along with filter_prepare API, if not
already documented.
Again , this seems to be specific to test_decoder and an example of a
way to create a filter_prepare.
QUESTION
Line 440
+pg_decode_abort_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,Is this a wrong comment
"ABORT PREPARED" --> "ROLLBACK PREPARED" ??;
COMMENT
Line 620
@@ -455,6 +605,22 @@ pg_decode_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;+ /* if check_xid is specified */ + if (TransactionIdIsValid(data->check_xid)) + { + elog(LOG, "waiting for %u to abort", data->check_xid); + while (TransactionIdIsInProgress(datThe check_xid seems a meaningless name, and the comment "/* if
check_xid is specified */" was not helpful either.
IIUC purpose of this is to check that the nominated xid always is rolled back.
So the appropriate name may be more like "check-xid-aborted".;
Yeah, this part deserves better comments.
Updated.
Other than these first batch of review comments from Peter Smith, I've
also updated new functions in decode.c for DecodeCommitPrepared
and DecodeAbortPrepared as agreed in a previous review comment by
Amit and Dilip.
I've also incorporated Dilip's comment on acquiring SHARED lock rather
than EXCLUSIVE lock while looking for transaction matching Gid.
Since Peter's comments are many, I'll be sending patch updates in
parts addressing his comments.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v7-0003-pgoutput-support-for-logical-decoding-of-2pc.patchapplication/octet-stream; name=v7-0003-pgoutput-support-for-logical-decoding-of-2pc.patchDownload
From 8ac357f16a4e0306e7252a1deac4488a0b1d6286 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 9 Oct 2020 02:02:08 -0400
Subject: [PATCH v7] pgoutput support for logical decoding of 2pc
Support decoding of two phase commit in pgoutput and on subscriber side.
---
src/backend/access/transam/twophase.c | 31 ++++++
src/backend/replication/logical/proto.c | 90 ++++++++++++++-
src/backend/replication/logical/worker.c | 147 ++++++++++++++++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 54 ++++++++-
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 37 ++++++-
src/test/subscription/t/020_twophase.pl | 163 ++++++++++++++++++++++++++++
7 files changed, 514 insertions(+), 9 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060..1200bf0 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,37 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (!gxact->valid)
+ continue;
+ if (strcmp(gxact->gid, gid) != 0)
+ continue;
+
+ LWLockRelease(TwoPhaseStateLock);
+
+ return true;
+ }
+
+ LWLockRelease(TwoPhaseStateLock);
+
+ return false;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..291ed10 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -72,12 +72,17 @@ logicalrep_read_begin(StringInfo in, LogicalRepBeginData *begin_data)
*/
void
logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, bool is_commit)
{
uint8 flags = 0;
pq_sendbyte(out, 'C'); /* sending COMMIT */
+ if (is_commit)
+ flags |= LOGICALREP_IS_COMMIT;
+ else
+ flags |= LOGICALREP_IS_ABORT;
+
/* send the flags field (unused for now) */
pq_sendbyte(out, flags);
@@ -88,16 +93,20 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
}
/*
- * Read transaction COMMIT from the stream.
+ * Read transaction COMMIT|ABORT from the stream.
*/
void
logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
{
- /* read flags (unused for now) */
+ /* read flags */
uint8 flags = pq_getmsgbyte(in);
- if (flags != 0)
- elog(ERROR, "unrecognized flags %u in commit message", flags);
+ if (!CommitFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in commit|abort message",
+ flags);
+
+ /* the flag is either commit or abort */
+ commit_data->is_commit = (flags == LOGICALREP_IS_COMMIT);
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
@@ -106,6 +115,77 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for 2PC transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(strlen(txn->gid) > 0);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags |= LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags |= LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9c6fdee..a08da85 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -729,7 +729,11 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_lsn = commit_data.end_lsn;
replorigin_session_origin_timestamp = commit_data.committime;
- CommitTransactionCommand();
+ if (commit_data.is_commit)
+ CommitTransactionCommand();
+ else
+ AbortCurrentTransaction();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -749,6 +753,141 @@ apply_handle_commit(StringInfo s)
pgstat_report_activity(STATE_IDLE, NULL);
}
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
/*
* Handle ORIGIN message.
*
@@ -1909,10 +2048,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 9c997ae..a078c50 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -143,6 +149,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->rollback_prepared_cb = pgoutput_rollback_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -373,7 +383,49 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginUpdateProgress(ctx);
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit(ctx->out, txn, commit_lsn, true);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);
}
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0c2cda2..33d719c 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -87,20 +87,55 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
+ bool is_commit;
XLogRecPtr commit_lsn;
XLogRecPtr end_lsn;
TimestampTz committime;
} LogicalRepCommitData;
+/* types of the commit protocol message */
+#define LOGICALREP_IS_COMMIT 0x01
+#define LOGICALREP_IS_ABORT 0x02
+
+/* commit message is COMMIT or ABORT, and there is nothing else */
+#define CommitFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_COMMIT) || (flags == LOGICALREP_IS_ABORT))
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ABORT] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_PREPARE) || \
+ (flags == LOGICALREP_IS_COMMIT_PREPARED) || \
+ (flags == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn, bool is_commit);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..c7f373d
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,163 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+ ));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf(
+ 'postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+"ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2"
+);
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (11);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 11;");
+ is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is committed on subscriber');
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (12);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 12;");
+ is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is aborted on subscriber');
+
+# Check that commit prepared is decoded properly on crash restart
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a IN (11,12);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
v7-0001-Support-decoding-of-two-phase-transactions.patchapplication/octet-stream; name=v7-0001-Support-decoding-of-two-phase-transactions.patchDownload
From 0091729a100f47494c3138ea4daf8add73e20f53 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 9 Oct 2020 01:45:06 -0400
Subject: [PATCH v7] Support decoding of two-phase transactions
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes documentation changes.
---
contrib/test_decoding/Makefile | 2 +-
contrib/test_decoding/expected/two_phase.out | 219 ++++++++++++++++++++
contrib/test_decoding/sql/two_phase.sql | 117 +++++++++++
contrib/test_decoding/test_decoding.c | 159 +++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 110 +++++++++-
src/backend/replication/logical/decode.c | 250 +++++++++++++++++++++--
src/backend/replication/logical/logical.c | 175 ++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 259 ++++++++++++++++++++----
src/include/replication/logical.h | 5 +
src/include/replication/output_plugin.h | 37 ++++
src/include/replication/reorderbuffer.h | 66 ++++++
11 files changed, 1348 insertions(+), 51 deletions(-)
create mode 100644 contrib/test_decoding/expected/two_phase.out
create mode 100644 contrib/test_decoding/sql/two_phase.sql
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index f23f15b..36b3f9a 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -4,7 +4,7 @@ MODULES = test_decoding
PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
- decoding_into_rel binary prepared replorigin time messages \
+ decoding_into_rel binary prepared replorigin time two_phase messages \
spill slot truncate stream
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
diff --git a/contrib/test_decoding/expected/two_phase.out b/contrib/test_decoding/expected/two_phase.out
new file mode 100644
index 0000000..3ac01a4
--- /dev/null
+++ b/contrib/test_decoding/expected/two_phase.out
@@ -0,0 +1,219 @@
+-- Test two-phased transactions, when two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ table public.test_prepared1: INSERT: id[integer]:2
+ PREPARE TRANSACTION 'test_prepared#1'
+(4 rows)
+
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
+-- test abort of a prepared xact
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(3 rows)
+
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
+-- test prepared xact containing ddl
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(3 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:5
+ COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:6 data[text]:null
+ COMMIT
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
+ COMMIT
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+----------------+----------+---------------------
+ test_prepared1 | relation | RowExclusiveLock
+ test_prepared1 | relation | ShareLock
+ test_prepared1 | relation | AccessExclusiveLock
+(3 rows)
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+(4 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints and sub-xacts as a result
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------------------------------------------------------------
+ BEGIN
+ table public.test_prepared_savepoint: INSERT: a[integer]:1
+ PREPARE TRANSACTION 'test_prepared_savepoint'
+(3 rows)
+
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+SELECT pg_drop_replication_slot('regression_slot');
+ pg_drop_replication_slot
+--------------------------
+
+(1 row)
+
diff --git a/contrib/test_decoding/sql/two_phase.sql b/contrib/test_decoding/sql/two_phase.sql
new file mode 100644
index 0000000..e3e2690
--- /dev/null
+++ b/contrib/test_decoding/sql/two_phase.sql
@@ -0,0 +1,117 @@
+-- Test two-phased transactions, when two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test abort of a prepared xact
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+
+-- test prepared xact containing ddl
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints and sub-xacts as a result
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e60ab34..6b8e502 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid_aborted; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -88,6 +93,18 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
void
_PG_init(void)
@@ -116,6 +133,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->rollback_prepared_cb = pg_decode_rollback_prepared_txn;
}
@@ -127,6 +148,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
ListCell *option;
TestDecodingData *data;
bool enable_streaming = false;
+ bool enable_2pc = false;
data = palloc0(sizeof(TestDecodingData));
data->context = AllocSetContextCreate(ctx->context,
@@ -136,6 +158,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid_aborted = InvalidTransactionId;
ctx->output_plugin_private = data;
@@ -227,6 +250,35 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
+ else if (!parse_bool(strVal(elem->arg), &enable_2pc))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse value \"%s\" for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+ }
+ else if (strcmp(elem->defname, "check-xid-aborted") == 0)
+ {
+ if (elem->arg == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted needs an input value")));
+ else
+ {
+ errno = 0;
+ data->check_xid_aborted = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (!TransactionIdIsValid(data->check_xid_aborted))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted is not a valid xid: \"%s\"",
+ strVal(elem->arg))));
+ }
+ }
else
{
ereport(ERROR,
@@ -238,6 +290,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->twophase &= enable_2pc;
}
/* cleanup this plugin's resources */
@@ -297,6 +350,93 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +595,25 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid_aborted is a valid xid, then it was passed in
+ * as an option to check if the transaction having this xid would be aborted.
+ * This is to test concurrent aborts.
+ */
+ if (TransactionIdIsValid(data->check_xid_aborted))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid_aborted);
+ while (TransactionIdIsInProgress(data->check_xid_aborted))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid_aborted) &&
+ !TransactionIdDidCommit(data->check_xid_aborted))
+ elog(LOG, "%u aborted", data->check_xid_aborted);
+
+ Assert(TransactionIdDidAbort(data->check_xid_aborted));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..f24470a 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -387,6 +387,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
@@ -477,7 +481,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +588,55 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Commit Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-rollback-prepared">
+ <title>Rollback Prepared Transaction Callback</title>
+
+ <para>
+ The optional <function>rollback_prepared_cb</function> callback is called whenever
+ a rollback prepared transaction has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +646,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +729,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +783,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee9..19fd92a 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -68,8 +68,15 @@ static void DecodeSpecConfirm(LogicalDecodingContext *ctx, XLogRecordBuffer *buf
static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
+static void DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -239,7 +246,6 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
switch (info)
{
case XLOG_XACT_COMMIT:
- case XLOG_XACT_COMMIT_PREPARED:
{
xl_xact_commit *xlrec;
xl_xact_parsed_commit parsed;
@@ -256,8 +262,24 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeCommit(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ xl_xact_commit *xlrec;
+ xl_xact_parsed_commit parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_commit *) XLogRecGetData(r);
+ ParseCommitRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeCommitPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ABORT:
- case XLOG_XACT_ABORT_PREPARED:
{
xl_xact_abort *xlrec;
xl_xact_parsed_abort parsed;
@@ -274,6 +296,23 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeAbort(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ xl_xact_abort *xlrec;
+ xl_xact_parsed_abort parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_abort *) XLogRecGetData(r);
+ ParseAbortRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeAbortPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ASSIGNMENT:
/*
@@ -312,17 +351,35 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* check that output plugin is capable of twophase decoding */
+ if (!ctx->twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
+
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -659,6 +716,131 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Consolidated commit record handling between the different form of commit
+ * records.
+ */
+static void
+DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid)
+{
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = parsed->xact_time;
+ RepOriginId origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
+ parsed->nsubxacts, parsed->subxacts);
+
+ /* ----
+ * Check whether we are interested in this specific transaction, and tell
+ * the reorderbuffer to forget the content of the (sub-)transactions
+ * if not.
+ *
+ * There can be several reasons we might not be interested in this
+ * transaction:
+ * 1) We might not be interested in decoding transactions up to this
+ * LSN. This can happen because we previously decoded it and now just
+ * are restarting or if we haven't assembled a consistent snapshot yet.
+ * 2) The transaction happened in another database.
+ * 3) The output plugin is not interested in the origin.
+ * 4) We are doing fast-forwarding
+ *
+ * We can't just use ReorderBufferAbort() here, because we need to execute
+ * the transaction's invalidations. This currently won't be needed if
+ * we're just skipping over the transaction because currently we only do
+ * so during startup, to get to the first transaction the client needs. As
+ * we have reset the catalog caches before starting to read WAL, and we
+ * haven't yet touched any catalogs, there can't be anything to invalidate.
+ * But if we're "forgetting" this commit because it's it happened in
+ * another database, the invalidations might be important, because they
+ * could be for shared catalogs and we might have loaded data into the
+ * relevant syscaches.
+ * ---
+ */
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ {
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
+ }
+ ReorderBufferForget(ctx->reorder, xid, buf->origptr);
+
+ return;
+ }
+
+ /* tell the reorderbuffer about the surviving subtransactions */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /*
+ * For COMMIT PREPARED, the changes have already been replayed at
+ * PREPARE time, so we only need to notify the subscriber that the GID
+ * finally committed.
+ * If filter check present and this needs to be skipped, do a regular commit.
+ */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+ else
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
+}
+
+/*
* Get the data from the various forms of abort records and pass it on to
* snapbuild.c and reorderbuffer.c
*/
@@ -681,6 +863,50 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Get the data from the various forms of abort records and pass it on to
+ * snapbuild.c and reorderbuffer.c
+ */
+static void
+DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid)
+{
+ int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it passes through the filters handle the ROLLBACK via callbacks
+ */
+ if(!FilterByOrigin(ctx, origin_id) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ !ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ Assert(TransactionIdIsValid(xid));
+ Assert(parsed->dbId == ctx->slot->data.database);
+
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
+
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferAbort(ctx->reorder, parsed->subxacts[i],
+ buf->record->EndRecPtr);
+ }
+
+ ReorderBufferAbort(ctx->reorder, xid, buf->record->EndRecPtr);
+}
+
+/*
* Parse XLOG_HEAP_INSERT (not MULTI_INSERT!) records into tuplebufs.
*
* Deletes can contain the new tuple.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 8675832..8aeb648 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -59,6 +59,14 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -207,6 +215,10 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->rollback_prepared = rollback_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -227,6 +239,19 @@ StartupDecodingContext(List *output_plugin_options,
(ctx->callbacks.stream_truncate_cb != NULL);
/*
+ * To support two phase logical decoding, we require prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however enable two phase logical
+ * decoding when at least one of the methods is enabled so that we can easily identify
+ * missing methods.
+ *
+ * We decide it here, but only check it later in the wrappers.
+ */
+ ctx->twophase = (ctx->callbacks.prepare_cb != NULL) ||
+ (ctx->callbacks.commit_prepared_cb != NULL) ||
+ (ctx->callbacks.rollback_prepared_cb != NULL) ||
+ (ctx->callbacks.filter_prepare_cb != NULL);
+
+ /*
* streaming callbacks
*
* stream_message and stream_truncate callbacks are optional, so we do not
@@ -783,6 +808,111 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then prepare callback is mandatory */
+ if (ctx->twophase && ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then commit prepared callback is mandatory */
+ if (ctx->twophase && ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "rollback_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then abort prepared callback is mandatory */
+ if (ctx->twophase && ctx->callbacks.rollback_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register rollback_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.rollback_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -859,6 +989,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of twophase at PREPARE time is not enabled. In that
+ * case all twophase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 4cb27f2..4da840c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
static void ReorderBufferCleanupSerializedTXNs(const char *slotname);
static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
TransactionId xid, XLogSegNo segno);
@@ -417,6 +418,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
if (txn->tuplecid_hash != NULL)
{
@@ -1506,12 +1512,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either after streaming or
+ * after a PREPARE.
+ * The flag txn_prepared indicates if this is called after a PREPARE.
+ * If streaming, keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots.If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared)
{
dlist_mutable_iter iter;
@@ -1530,7 +1538,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
/* cleanup changes in the txn */
@@ -1564,9 +1572,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
+ {
+ /*
+ * If this is a prepared txn, cleanup the tuplecids we stored for decoding
+ * catalog snapshot access.
+ * They are always stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);
+ Assert(change->action == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* remove the change from it's containing list */
+ dlist_delete(&change->node);
+
+ ReorderBufferReturnChange(rb, change, true);
+ }
+ }
+
/*
* Destroy the (relfilenode, ctid) hashtable, so that we don't leak any
* memory. We could also keep the hash table and update it with new ctid
@@ -1891,7 +1923,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
ReorderBufferChange *specinsert)
{
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Free all resources allocated for toast reconstruction */
ReorderBufferToastReset(rb, txn);
@@ -1998,7 +2030,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2260,7 +2292,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
break;
}
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
@@ -2289,7 +2320,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2330,11 +2370,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
+ {
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
else
ReorderBufferCleanupTXN(rb, txn);
}
@@ -2363,17 +2409,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
- * This error can only occur when we are sending the data in
- * streaming mode and the streaming is not finished yet.
+ * This error can only occur either when we are sending the data in
+ * streaming mode and the streaming is not finished yet or when we are
+ * sending the data out on a PREPARE during a twoi phase commit.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2381,10 +2428,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream remaining data. */
+ if (streaming)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
}
else
{
@@ -2406,23 +2462,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2464,6 +2513,140 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a twophase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ABORT PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->rollback_prepared(rb, txn, commit_lsn);
+ }
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2506,7 +2689,13 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
+
ReorderBufferCleanupTXN(rb, txn);
}
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 40bab7e..b4592da 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -84,6 +84,11 @@ typedef struct LogicalDecodingContext
*/
bool streaming;
+ /*
+ * Does the output plugin support two phase decoding, and is it enabled?
+ */
+ bool twophase;
+
/*
* State for writing output.
*/
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..96acd01 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -77,6 +77,39 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/rollback_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -171,6 +204,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0cc3aeb..e060107 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -166,6 +167,9 @@ typedef struct ReorderBufferChange
#define RBTXN_IS_STREAMED 0x0010
#define RBTXN_HAS_TOAST_INSERT 0x0020
#define RBTXN_HAS_SPEC_INSERT 0x0040
+#define RBTXN_PREPARE 0x0080
+#define RBTXN_COMMIT_PREPARED 0x0100
+#define RBTXN_ROLLBACK_PREPARED 0x0200
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -225,6 +229,13 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* is this txn prepared? */
+#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE)
+/* was this prepared txn committed in the meanwhile? */
+#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED)
+/* was this prepared txn aborted in the meanwhile? */
+#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -236,6 +247,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -397,6 +411,39 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferRollbackPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+
+
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -489,6 +536,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferRollbackPreparedCB rollback_prepared;
ReorderBufferMessageCB message;
/*
@@ -566,6 +618,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -589,6 +646,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
v7-0004-Support-two-phase-commits-in-streaming-mode-of-lo.patchapplication/octet-stream; name=v7-0004-Support-two-phase-commits-in-streaming-mode-of-lo.patchDownload
From 16ac52c5e944a8122c589c322a1bcaac035bc4e3 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 9 Oct 2020 02:38:46 -0400
Subject: [PATCH v7] Support two phase commits in streaming mode of logical
decoding
Add streaming APIS for PREPARE, COMMIT PREPARED and ROLLBACK PREPARED
---
contrib/test_decoding/test_decoding.c | 84 ++++++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 56 +++++++++++-
src/backend/replication/logical/logical.c | 111 ++++++++++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 41 +++++++--
src/include/replication/output_plugin.h | 30 +++++++
src/include/replication/reorderbuffer.h | 21 +++++
6 files changed, 331 insertions(+), 12 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 6b8e502..4508917 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -78,6 +78,15 @@ static void pg_decode_stream_stop(LogicalDecodingContext *ctx,
static void pg_decode_stream_abort(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_stream_commit_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_stream_rollback_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
static void pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
@@ -129,6 +138,9 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_start_cb = pg_decode_stream_start;
cb->stream_stop_cb = pg_decode_stream_stop;
cb->stream_abort_cb = pg_decode_stream_abort;
+ cb->stream_prepare_cb = pg_decode_stream_prepare;
+ cb->stream_commit_prepared_cb = pg_decode_stream_commit_prepared;
+ cb->stream_rollback_prepared_cb = pg_decode_stream_rollback_prepared;
cb->stream_commit_cb = pg_decode_stream_commit;
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
@@ -805,6 +817,78 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}
static void
+pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "preparing streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "preparing streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
+pg_decode_stream_commit_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "commit prepared streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "commit prepared streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
+pg_decode_stream_rollback_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "abort prepared streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "abort prepared streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index f24470a..3769896 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -396,6 +396,9 @@ typedef struct OutputPluginCallbacks
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamRollbackPreparedCB stream_rollback_prepared_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
@@ -418,7 +421,9 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb);
in-progress transactions. The <function>stream_start_cb</function>,
<function>stream_stop_cb</function>, <function>stream_abort_cb</function>,
<function>stream_commit_cb</function> and <function>stream_change_cb</function>
- are required, while <function>stream_message_cb</function> and
+ are required, while <function>stream_message_cb</function>,
+ <function>stream_prepare_cb</function>, <function>stream_commit_prepared_cb</function>,
+ <function>stream_rollback_prepared_cb</function>,
<function>stream_truncate_cb</function> are optional.
</para>
</sect2>
@@ -839,6 +844,45 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-stream-prepare">
+ <title>Stream Prepare Callback</title>
+ <para>
+ The <function>stream_prepare_cb</function> callback is called to prepare
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-commit-prepared">
+ <title>Stream Commit Prepared Callback</title>
+ <para>
+ The <function>stream_commit_prepared_cb</function> callback is called to commit prepared
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-abort-prepared">
+ <title>Stream Abort Prepared Callback</title>
+ <para>
+ The <function>stream_rollback_prepared_cb</function> callback is called to abort prepared
+ a previously streamed transaction as part of a two phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamAbortPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-stream-commit">
<title>Stream Commit Callback</title>
<para>
@@ -1017,9 +1061,13 @@ OutputPluginWrite(ctx, true);
When streaming an in-progress transaction, the changes (and messages) are
streamed in blocks demarcated by <function>stream_start_cb</function>
and <function>stream_stop_cb</function> callbacks. Once all the decoded
- changes are transmitted, the transaction is committed using the
- <function>stream_commit_cb</function> callback (or possibly aborted using
- the <function>stream_abort_cb</function> callback).
+ changes are transmitted, the transaction can be committed using the
+ the <function>stream_commit_cb</function> callback
+ (or possibly aborted using the <function>stream_abort_cb</function> callback).
+ If two phase commits are supported, the transaction can be prepared using the
+ <function>stream_prepare_cb</function> callback, commit prepared using the
+ <function>stream_commit_prepared_cb</function> callback or aborted using the
+ <function>stream_rollback_prepared_cb</function>
</para>
<para>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 8aeb648..6b01e2a 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -82,6 +82,12 @@ static void stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr last_lsn);
static void stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
static void stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
static void stream_change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -233,6 +239,9 @@ StartupDecodingContext(List *output_plugin_options,
ctx->streaming = (ctx->callbacks.stream_start_cb != NULL) ||
(ctx->callbacks.stream_stop_cb != NULL) ||
(ctx->callbacks.stream_abort_cb != NULL) ||
+ (ctx->callbacks.stream_prepare_cb != NULL) ||
+ (ctx->callbacks.stream_commit_prepared_cb != NULL) ||
+ (ctx->callbacks.stream_rollback_prepared_cb != NULL) ||
(ctx->callbacks.stream_commit_cb != NULL) ||
(ctx->callbacks.stream_change_cb != NULL) ||
(ctx->callbacks.stream_message_cb != NULL) ||
@@ -262,6 +271,9 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->stream_start = stream_start_cb_wrapper;
ctx->reorder->stream_stop = stream_stop_cb_wrapper;
ctx->reorder->stream_abort = stream_abort_cb_wrapper;
+ ctx->reorder->stream_prepare = stream_prepare_cb_wrapper;
+ ctx->reorder->stream_commit_prepared = stream_commit_prepared_cb_wrapper;
+ ctx->reorder->stream_rollback_prepared = stream_rollback_prepared_cb_wrapper;
ctx->reorder->stream_commit = stream_commit_cb_wrapper;
ctx->reorder->stream_change = stream_change_cb_wrapper;
ctx->reorder->stream_message = stream_message_cb_wrapper;
@@ -1232,6 +1244,105 @@ stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_prepare";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ ctx->callbacks.stream_prepare_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_commit_prepared";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ ctx->callbacks.stream_commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+stream_rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_rollback_prepared";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ ctx->callbacks.stream_rollback_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 4da840c..a472d08 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1792,9 +1792,18 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn)
ReorderBufferStreamTXN(rb, txn);
- rb->stream_commit(rb, txn, txn->final_lsn);
-
- ReorderBufferCleanupTXN(rb, txn);
+ if (rbtxn_prepared(txn))
+ {
+ rb->stream_prepare(rb, txn, txn->final_lsn);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
+ else
+ {
+ rb->stream_commit(rb, txn, txn->final_lsn);
+ ReorderBufferCleanupTXN(rb, txn);
+ }
}
/*
@@ -2630,15 +2639,31 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
strcpy(txn->gid, gid);
- if (is_commit)
+ if (rbtxn_is_streamed(txn))
{
- txn->txn_flags |= RBTXN_COMMIT_PREPARED;
- rb->commit_prepared(rb, txn, commit_lsn);
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->stream_commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->stream_rollback_prepared(rb, txn, commit_lsn);
+ }
}
else
{
- txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
- rb->rollback_prepared(rb, txn, commit_lsn);
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->rollback_prepared(rb, txn, commit_lsn);
+ }
}
/* cleanup: make sure there's no cache pollution */
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 96acd01..29d3ffd 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -157,6 +157,33 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);
/*
+ * Called to prepare changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit and only when
+ * two-phased commits are supported
+ */
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called to commit prepared changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit and only when
+ * two-phased commits are supported
+ */
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called to abort/rollback prepared changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit and only when
+ * two-phased commits are supported
+ */
+typedef void (*LogicalDecodeStreamRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
* Called to apply changes streamed to remote node from in-progress
* transaction.
*/
@@ -214,6 +241,9 @@ typedef struct OutputPluginCallbacks
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamRollbackPreparedCB stream_rollback_prepared_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index e060107..94303a3 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -470,6 +470,24 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamRollbackPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
/* commit streamed transaction callback signature */
typedef void (*ReorderBufferStreamCommitCB) (
ReorderBuffer *rb,
@@ -549,6 +567,9 @@ struct ReorderBuffer
ReorderBufferStreamStartCB stream_start;
ReorderBufferStreamStopCB stream_stop;
ReorderBufferStreamAbortCB stream_abort;
+ ReorderBufferStreamPrepareCB stream_prepare;
+ ReorderBufferStreamCommitPreparedCB stream_commit_prepared;
+ ReorderBufferStreamRollbackPreparedCB stream_rollback_prepared;
ReorderBufferStreamCommitCB stream_commit;
ReorderBufferStreamChangeCB stream_change;
ReorderBufferStreamMessageCB stream_message;
--
1.8.3.1
v7-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchapplication/octet-stream; name=v7-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchDownload
From aeee11f7fc41fbc20d2afd4a49f60e7a89c42457 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 9 Oct 2020 01:46:45 -0400
Subject: [PATCH v7] Tap test to test concurrent aborts during 2 phase commits
This test is specifically for testing concurrent abort while logical decode
is ongoing. Pass in the xid of the 2PC to the plugin as an option.
On the receipt of a valid "check-xid", the change API in the test decoding
plugin will wait for it to be aborted.
---
contrib/test_decoding/Makefile | 2 +
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++++++++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 36b3f9a..3fb7dac 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -9,6 +9,8 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..51c6a35
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid-aborted", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid-aborted" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid-aborted"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid-aborted', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
--
1.8.3.1
On Wed, Oct 7, 2020 at 9:36 AM Peter Smith <smithpb2250@gmail.com> wrote:
==========
Patch V6-0001, File: doc/src/sgml/logicaldecoding.sgml
==========COMMENT/QUESTION Section 48.6.1 @ -387,6 +387,10 @@ typedef struct OutputPluginCallbacks LogicalDecodeTruncateCB truncate_cb; LogicalDecodeCommitCB commit_cb; LogicalDecodeMessageCB message_cb; + LogicalDecodeFilterPrepareCB filter_prepare_cb;Confused by the mixing of terminologies "abort" and "rollback".
Why is it LogicalDecodeAbortPreparedCB instead of
LogicalDecodeRollbackPreparedCB?
Why is it abort_prepared_cb instead of rollback_prepared_cb;?I thought everything the user sees should be ROLLBACK/rollback (like
the SQL) regardless of what the internal functions might be called.;
Modified.
COMMENT
Section 48.6.1
The begin_cb, change_cb and commit_cb callbacks are required, while
startup_cb, filter_by_origin_cb, truncate_cb, and shutdown_cb are
optional. If truncate_cb is not set but a TRUNCATE is to be decoded,
the action will be ignored.The 1st paragraph beneath the typedef does not mention the newly added
callbacks to say if they are required or optional.
Added a new para for this.
;
COMMENT
Section 48.6.4.5
Section 48.6.4.6
Section 48.6.4.7
@@ -578,6 +588,55 @@ typedef void (*LogicalDecodeCommitCB) (struct
LogicalDecodingContext *ctx,
</para>
</sect3>+ <sect3 id="logicaldecoding-output-plugin-prepare"> + <sect3 id="logicaldecoding-output-plugin-commit-prepared"> + <sect3 id="logicaldecoding-output-plugin-abort-prepared"> +<programlisting>The wording and titles are a bit backwards compared to the others.
e.g. previously was "Transaction Begin" (not "Begin Transaction") and
"Transaction End" (not "End Transaction").So for consistently following the existing IMO should change these new
titles (and wording) to:
- "Commit Prepared Transaction Callback" --> "Transaction Commit
Prepared Callback"
- "Rollback Prepared Transaction Callback" --> "Transaction Rollback
Prepared Callback"
- "whenever a commit prepared transaction has been decoded" -->
"whenever a transaction commit prepared has been decoded"
- "whenever a rollback prepared transaction has been decoded." -->
"whenever a transaction rollback prepared has been decoded.";
Updated to this
==========
Patch V6-0001, File: src/backend/replication/logical/decode.c
==========COMMENT Line 74 @@ -70,6 +70,9 @@ static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, xl_xact_parsed_commit *parsed, TransactionId xid); static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, xl_xact_parsed_abort *parsed, TransactionId xid); +static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, + xl_xact_parsed_prepare * parsed);The 2nd line of DecodePrepare is misaligned by one space.
;
COMMENT Line 321 @@ -312,17 +315,34 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) } break; case XLOG_XACT_PREPARE: + { + xl_xact_parsed_prepare parsed; + xl_xact_prepare *xlrec; + /* check that output plugin is capable of twophase decoding */"twophase" --> "two-phase"
~
Also, add a blank line after the declarations.
;
==========
Patch V6-0001, File: src/backend/replication/logical/logical.c
==========COMMENT
Line 249
@@ -225,6 +237,19 @@ StartupDecodingContext(List *output_plugin_options,
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);+ /* + * To support two phase logical decoding, we require prepare/commit-prepare/abort-prepare + * callbacks. The filter-prepare callback is optional. We however enable two phase logical + * decoding when at least one of the methods is enabled so that we can easily identify + * missing methods.The terminology is generally well known as "two-phase" (with the
hyphen) https://en.wikipedia.org/wiki/Two-phase_commit_protocol so
let's be consistent for all the patch code comments. Please search the
code and correct this in all places, even where I might have missed to
identify it."two phase" --> "two-phase"
;
COMMENT
Line 822
@@ -782,6 +807,111 @@ commit_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
}static void +prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, + XLogRecPtr prepare_lsn)"support 2 phase" --> "supports two-phase" in the comment
;
COMMENT
Line 844
Code condition seems strange and/or broken.
if (ctx->enable_twophase && ctx->callbacks.prepare_cb == NULL)
Because if the flag is null then this condition is skipped.
But then if the callback was also NULL then attempting to call it to
"do the actual work" will give NPE.~
Also, I wonder should this check be the first thing in this function?
Because if it fails does it even make sense that all the errcallback
code was set up?
E.g errcallback.arg potentially is left pointing to a stack variable
on a stack that no longer exists.
Updated accordingly.
;
COMMENT
Line 857
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,"support 2 phase" --> "supports two-phase" in the comment
~
Also, Same potential trouble with the condition:
if (ctx->enable_twophase && ctx->callbacks.commit_prepared_cb == NULL)
Same as previously asked. Should this check be first thing in this function?;
COMMENT
Line 892
+abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,"support 2 phase" --> "supports two-phase" in the comment
~
Same potential trouble with the condition:
if (ctx->enable_twophase && ctx->callbacks.abort_prepared_cb == NULL)
Same as previously asked. Should this check be the first thing in this function?;
COMMENT
Line 1013
@@ -858,6 +988,51 @@ truncate_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}+static bool +filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, + TransactionId xid, const char *gid)Fix wording in comment:
"twophase" --> "two-phase transactions"
"twophase transactions" --> "two-phase transactions"
Updated accordingly.
==========
Patch V6-0001, File: src/backend/replication/logical/reorderbuffer.c
==========COMMENT Line 255 @@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn, char *change); static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn); -static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn); +static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, + bool txn_prepared);The alignment is inconsistent. One more space needed before "bool txn_prepared"
;
COMMENT
Line 417
@@ -413,6 +414,11 @@ ReorderBufferReturnTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
}/* free data that's contained */ + if (txn->gid != NULL) + { + pfree(txn->gid); + txn->gid = NULL; + }Should add the blank link before this new code, as it was before.
;
COMMENT
Line 1564
@ -1502,12 +1561,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
}/* - * Discard changes from a transaction (and subtransactions), after streaming - * them. Keep the remaining info - transactions, tuplecids, invalidations and - * snapshots. + * Discard changes from a transaction (and subtransactions), either after streaming or + * after a PREPARE.typo "snapshots.If" -> "snapshots. If"
;
Updated Accordingly.
COMMENT/QUESTION
Line 1590
@@ -1526,7 +1587,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);- ReorderBufferTruncateTXN(rb, subtxn); + ReorderBufferTruncateTXN(rb, subtxn, txn_prepared); }There are some code paths here I did not understand how they match the comments.
Because this function is recursive it seems that it may be called
where the 2nd parameter txn is a sub-transaction.But then this seems at odds with some of the other code comments of
this function which are processing the txn without ever testing is it
really toplevel or not:e.g. Line 1593 "/* cleanup changes in the toplevel txn */"
e.g. Line 1632 "They are always stored in the toplevel transaction.";
I see that another commit in between has updated this now.
COMMENT Line 1644 @@ -1560,9 +1621,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn) * about the toplevel xact (we send the XID in all messages), but we never * stream XIDs of empty subxacts. */ - if ((!txn->toptxn) || (txn->nentries_mem != 0)) + if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0))) txn->txn_flags |= RBTXN_IS_STREAMED;+ if (txn_prepared)
/* remove the change from it's containing list */
typo "it's" --> "its"
Updated.
;
QUESTION Line 1977 @@ -1880,7 +1965,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, ReorderBufferChange *specinsert) { /* Discard the changes that we just streamed */ - ReorderBufferTruncateTXN(rb, txn); + ReorderBufferTruncateTXN(rb, txn, false);How do you know the 3rd parameter - i.e. txn_prepared - should be
hardwired false here?
e.g. I thought that maybe rbtxn_prepared(txn) can be true here.;
This particular function is only called when streaming and not when
handling a prepared transaction.
COMMENT
Line 2345
@@ -2249,7 +2334,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
break;
}
}
-
/*Looks like accidental blank line deletion. This should be put back how it was
;
COMMENT/QUESTION Line 2374 @@ -2278,7 +2362,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, } } else - rb->commit(rb, txn, commit_lsn); + { + /* + * Call either PREPARE (for twophase transactions) or COMMIT + * (for regular ones)."twophase" --> "two-phase"
~
Updated.
Also, I was confused by the apparent assumption of exclusiveness of
streaming and 2PC...
e.g. what if streaming AND 2PC then it won't do rb->prepare();
QUESTION Line 2424 @@ -2319,11 +2412,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, */ if (streaming) { - ReorderBufferTruncateTXN(rb, txn); + ReorderBufferTruncateTXN(rb, txn, false);/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))I was confused by the exclusiveness of streaming/2PC.
e.g. what if streaming AND 2PC at same time - how can you pass false
as 3rd param to ReorderBufferTruncateTXN?
ReorderBufferProcessTXN can only be called when streaming individual
commands and not for streaming a prepare or a commit, Streaming of
prepare and commit would be handled as part of
ReorderBufferStreamCommit.
;
COMMENT
Line 2463
@@ -2352,17 +2451,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,/* * The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent - * abort of the (sub)transaction we are streaming. We need to do the + * abort of the (sub)transaction we are streaming or preparing. We need to do the * cleanup and return gracefully on this error, see SetupCheckXidLive. */"twoi phase" --> "two-phase"
;
QUESTIONS
Line 2482
@@ -2370,10 +2470,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;- /* Reset the TXN so that it is allowed to stream remaining data. */ - ReorderBufferResetTXN(rb, txn, snapshot_now, - command_id, prev_lsn, - specinsert); + /* If streaming, reset the TXN so that it is allowed to stream remaining data. */ + if (streaming)Re: /* If streaming, reset the TXN so that it is allowed to stream
remaining data. */
I was confused by the exclusiveness of streaming/2PC.
Is it not possible for streaming flags and rbtxn_prepared(txn) true at
the same time?
Same as above.
~
elog(LOG, "stopping decoding of %s (%u)",
txn->gid[0] != '\0'? txn->gid:"", txn->xid);Is this a safe operation, or do you also need to test txn->gid is not NULL?
Since this is in code where it is not streaming and therefore
rbtxn_prepared(txn), so gid has to be NOT NULL.
;
COMMENT
Line 2606
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,"twophase" --> "two-phase"
;
QUESTION
Line 2655
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,"This is used to handle COMMIT/ABORT PREPARED"
Should that say "COMMIT/ROLLBACK PREPARED"?;
COMMENT
Line 2668"Anyways, 2PC transactions" --> "Anyway, two-phase transactions"
;
COMMENT
Line 2765
@@ -2495,7 +2731,13 @@ ReorderBufferAbort(ReorderBuffer *rb,
TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;- /* remove potential on-disk data, and deallocate */ + /* + * remove potential on-disk data, and deallocate. + *Remove the blank between the comment and code.
==========
Patch V6-0001, File: src/include/replication/logical.h
==========COMMENT
Line 89"two phase" -> "two-phase"
;
COMMENT
Line 89For consistency with the previous member naming really the new member
should just be called "twophase" rather than "enable_twophase";
Updated accordingly.
==========
Patch V6-0001, File: src/include/replication/output_plugin.h
==========QUESTION
Line 106As previously asked, why is the callback function/typedef referred as
AbortPrepared instead of RollbackPrepared?
It does not match the SQL and the function comment, and seems only to
add some unnecessary confusion.;
==========
Patch V6-0001, File: src/include/replication/reorderbuffer.h
==========QUESTION Line 116 @@ -162,9 +163,13 @@ typedef struct ReorderBufferChange #define RBTXN_HAS_CATALOG_CHANGES 0x0001 #define RBTXN_IS_SUBXACT 0x0002 #define RBTXN_IS_SERIALIZED 0x0004 -#define RBTXN_IS_STREAMED 0x0008 -#define RBTXN_HAS_TOAST_INSERT 0x0010 -#define RBTXN_HAS_SPEC_INSERT 0x0020 +#define RBTXN_PREPARE 0x0008 +#define RBTXN_COMMIT_PREPARED 0x0010 +#define RBTXN_ROLLBACK_PREPARED 0x0020 +#define RBTXN_COMMIT 0x0040 +#define RBTXN_IS_STREAMED 0x0080 +#define RBTXN_HAS_TOAST_INSERT 0x0100 +#define RBTXN_HAS_SPEC_INSERT 0x0200I was wondering why when adding new flags, some of the existing flag
masks were also altered.
I am assuming this is ok because they are never persisted but are only
used in the protocol (??);
COMMENT
Line 226
@@ -218,6 +223,15 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)+/* is this txn prepared? */ +#define rbtxn_prepared(txn) (txn->txn_flags & RBTXN_PREPARE) +/* was this prepared txn committed in the meanwhile? */ +#define rbtxn_commit_prepared(txn) (txn->txn_flags & RBTXN_COMMIT_PREPARED) +/* was this prepared txn aborted in the meanwhile? */ +#define rbtxn_rollback_prepared(txn) (txn->txn_flags & RBTXN_ROLLBACK_PREPARED) +/* was this txn committed in the meanwhile? */ +#define rbtxn_commit(txn) (txn->txn_flags & RBTXN_COMMIT) +Probably all the "txn->txn_flags" here might be more safely written
with parentheses in the macro like "(txn)->txn_flags".~
Also, Start all comments with capital. And what is the meaning "in the
meanwhile?";
COMMENT
Line 410
@@ -390,6 +407,39 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);The format is inconsistent with all other callback signatures here,
where the 1st arg was on the same line as the typedef.;
COMMENT
Line 440-442Excessive blank lines following this change?
;
COMMENT
Line 638
@@ -571,6 +631,15 @@ void
ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid,
XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, + const char *gid); +bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid, + const char *gid); +void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid, + XLogRecPtr commit_lsn, XLogRecPtr end_lsn, + TimestampTz commit_time, + RepOriginId origin_id, XLogRecPtr origin_lsn, + char *gid);Not aligned consistently with other function prototypes.
;
Updated
==========
Patch V6-0003, File: src/backend/access/transam/twophase.c
==========COMMENT
Line 551
@@ -548,6 +548,37 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}/* + * LookupGXact + * Check if the prepared transaction with the given GID is around + */ +bool +LookupGXact(const char *gid)There is potential to refactor/simplify this code:
e.g.bool
LookupGXact(const char *gid)
{
int i;
bool found = false;LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
{
GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
/* Ignore not-yet-valid GIDs */
if (gxact->valid && strcmp(gxact->gid, gid) == 0)
{
found = true;
break;
}
}
LWLockRelease(TwoPhaseStateLock);
return found;
};
Updated accordingly.
==========
Patch V6-0003, File: src/backend/replication/logical/proto.c
==========COMMENT
Line 86
@@ -72,12 +72,17 @@ logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data)
*/
void
logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)Since now the flags are used the code comment is wrong.
"/* send the flags field (unused for now) */";
COMMENT
Line 129
@ -106,6 +115,77 @@ logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data)
}/* + * Write PREPARE to the output stream. + */ +void +logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,"2PC transactions" --> "two-phase commit transactions"
;
Updated
COMMENT
Line 133Assert(strlen(txn->gid) > 0);
Shouldn't that assertion also check txn->gid is not NULL (to prevent
NPE in case gid was NULL)
In this case txn->gid has to be non NULL.
;
COMMENT
Line 177
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)prepare_data->prepare_type = flags;
This code may be OK but it does seem a bit of an abuse of the flags.e.g. Are they flags or are the really enum values?
e.g. And if they are effectively enums (it appears they are) then
seemed inconsistent that |= was used when they were previously
assigned.;
I have not updated this as according to Amit this might require
refactoring again.
==========
Patch V6-0003, File: src/backend/replication/logical/worker.c
==========COMMENT
Line 757
@@ -749,6 +753,141 @@ apply_handle_commit(StringInfo s)
pgstat_report_activity(STATE_IDLE, NULL);
}+static void +apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data) +{ + Assert(prepare_data->prepare_lsn == remote_final_lsn);Missing function comment to say this is called from apply_handle_prepare.
;
COMMENT
Line 798
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)Missing function comment to say this is called from apply_handle_prepare.
;
COMMENT
Line 824
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)Missing function comment to say this is called from apply_handle_prepare.
Updated.
==========
Patch V6-0003, File: src/backend/replication/pgoutput/pgoutput.c
==========COMMENT Line 50 @@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferChange *change); static bool pgoutput_origin_filter(LogicalDecodingContext *ctx, RepOriginId origin_id); +static void pgoutput_prepare_txn(LogicalDecodingContext *ctx, + ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);The parameter indentation (2nd lines) does not match everything else
in this context.;
COMMENT Line 152 @@ -143,6 +149,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb) cb->change_cb = pgoutput_change; cb->truncate_cb = pgoutput_truncate; cb->commit_cb = pgoutput_commit_txn; + + cb->prepare_cb = pgoutput_prepare_txn; + cb->commit_prepared_cb = pgoutput_commit_prepared_txn; + cb->abort_prepared_cb = pgoutput_abort_prepared_txn;Remove the unnecessary blank line.
;
QUESTION
Line 386
@@ -373,7 +383,49 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
OutputPluginUpdateProgress(ctx);OutputPluginPrepareWrite(ctx, true); - logicalrep_write_commit(ctx->out, txn, commit_lsn); + logicalrep_write_commit(ctx->out, txn, commit_lsn, true);Is the is_commit parameter of logicalrep_write_commit ever passed as false?
If yes, where?
If no, the what is the point of it?
It was dead code from an earlier version. I have removed it, updated
accordingly.
;
COMMENT
Line 408
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,Since all this function is identical to pg_output_prepare it might be
better to either
1. just leave this as a wrapper to delegate to that function
2. remove this one entirely and assign the callback to the common
pgoutput_prepare_txn;
I have not changed this as this might require re-factoring according to Amit.
COMMENT
Line 419
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,Since all this function is identical to pg_output_prepare if might be
better to either
1. just leave this as a wrapper to delegate to that function
2. remove this one entirely and assign the callback to the common
pgoutput_prepare_tx;
Same as above.
COMMENT
Line 419
+pgoutput_abort_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,Shouldn't this comment say be "ROLLBACK PREPARED"?
;
Updated.
==========
Patch V6-0003, File: src/include/replication/logicalproto.h
==========QUESTION
Line 101
@@ -87,20 +87,55 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;+/* Commit (and abort) information */
#define LOGICALREP_IS_ABORT 0x02
Is there a good reason why this is not called:
#define LOGICALREP_IS_ROLLBACK 0x02;
Removed.
COMMENT
Line 105((flags == LOGICALREP_IS_COMMIT) || (flags == LOGICALREP_IS_ABORT))
Macros would be safer if flags are in parentheses
(((flags) == LOGICALREP_IS_COMMIT) || ((flags) == LOGICALREP_IS_ABORT));
COMMENT
Line 115Unexpected whitespace for the typedef
"} LogicalRepPrepareData;";
COMMENT
Line 122
/* prepare can be exactly one of PREPARE, [COMMIT|ABORT] PREPARED*/
#define PrepareFlagsAreValid(flags) \
((flags == LOGICALREP_IS_PREPARE) || \
(flags == LOGICALREP_IS_COMMIT_PREPARED) || \
(flags == LOGICALREP_IS_ROLLBACK_PREPARED))There is confusing mixture in macros and comments of ABORT and ROLLBACK terms
"[COMMIT|ABORT] PREPARED" --> "[COMMIT|ROLLBACK] PREPARED"~
Also, it would be safer if flags are in parentheses
(((flags) == LOGICALREP_IS_PREPARE) || \
((flags) == LOGICALREP_IS_COMMIT_PREPARED) || \
((flags) == LOGICALREP_IS_ROLLBACK_PREPARED));
updated.
==========
Patch V6-0003, File: src/test/subscription/t/020_twophase.pl
==========COMMENT
Line 131 - # check inserts are visibleIsn't this supposed to be checking for rows 12 and 13, instead of 11 and 12?
;
Updated.
==========
Patch V6-0004, File: contrib/test_decoding/test_decoding.c
==========COMMENT Line 81 @@ -78,6 +78,15 @@ static void pg_decode_stream_stop(LogicalDecodingContext *ctx, static void pg_decode_stream_abort(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, XLogRecPtr abort_lsn); +static void pg_decode_stream_prepare(LogicalDecodingContext *ctx, + ReorderBufferTXN *txn, + XLogRecPtr commit_lsn); +staticAll these functions have a 3rd parameter called commit_lsn. Even
though the functions are not commit related. It seems like a cut/paste
error.;
COMMENT Line 142 @@ -130,6 +139,9 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb) cb->stream_start_cb = pg_decode_stream_start; cb->stream_stop_cb = pg_decode_stream_stop; cb->stream_abort_cb = pg_decode_stream_abort; + cb->stream_prepare_cb = pg_decode_stream_prepare; + cb->stream_commit_prepared_cb = pg_decode_stream_commit_prepared; + cb->stream_abort_prepared_cb = pg_decode_stream_abort_prepared; cb->stream_commit_cb = pg_decode_stream_commit;Can the "cb->stream_abort_prepared_cb" be changed to
"cb->stream_rollback_prepared_cb"?;
COMMENT
Line 827
@@ -812,6 +824,78 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}static void +pg_decode_stream_prepare(LogicalDecodingContext *ctx, + ReorderBufferTXN *txn, + XLogRecPtr commit_lsn) +{ + TestDecodingData *data = ctx->output_plugin_prThe commit_lsn (3rd parameter) is unused and seems like a cut/paste name error.
;
COMMENT
Line 875
+pg_decode_stream_abort_prepared(LogicalDecodingContext *ctx,The commit_lsn (3rd parameter) is unused and seems like a cut/paste name error.
;
Updated.
==========
Patch V6-0004, File: doc/src/sgml/logicaldecoding.sgml
==========COMMENT 48.6.1 @@ -396,6 +396,9 @@ typedef struct OutputPluginCallbacks LogicalDecodeStreamStartCB stream_start_cb; LogicalDecodeStreamStopCB stream_stop_cb; LogicalDecodeStreamAbortCB stream_abort_cb; + LogicalDecodeStreamPrepareCB stream_prepare_cb; + LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb; + LogicalDecodeStreamAbortPreparedCB stream_abort_prepared_cb;Same question from previous review comments - why using the
terminology "abort" instead of "rollback";
COMMENT 48.6.1 @@ -418,7 +421,9 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb); in-progress transactions. The <function>stream_start_cb</function>, <function>stream_stop_cb</function>, <function>stream_abort_cb</function>, <function>stream_commit_cb</function> and <function>stream_change_cb</function> - are required, while <function>stream_message_cb</function> and + are required, while <function>stream_message_cb</function>, + <function>stream_prepare_cb</function>, <function>stream_commit_prepared_cb</function>, + <function>stream_abort_prepared_cb</function>,Missing "and".
... "stream_abort_prepared_cb, stream_truncate_cb are optional." -->
"stream_abort_prepared_cb, and stream_truncate_cb are optional.";
COMMENT
Section 48.6.4.16
Section 48.6.4.17
Section 48.6.4.18
@@ -839,6 +844,45 @@ typedef void (*LogicalDecodeStreamAbortCB)
(struct LogicalDecodingContext *ctx,
</para>
</sect3>+ <sect3 id="logicaldecoding-output-plugin-stream-prepare"> + <title>Stream Prepare Callback</title> + <para> + The <function>stream_prepare_cb</function> callback is called to prepare + a previously streamed transaction as part of a two phase commit. +<programlisting> +typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx, + ReorderBufferTXN *txn, + XLogRecPtr abort_lsn); +</programlisting> + </para> + </sect3> + + <sect3 id="logicaldecoding-output-plugin-stream-commit-prepared"> + <title>Stream Commit Prepared Callback</title> + <para> + The <function>stream_commit_prepared_cb</function> callback is called to commit prepared + a previously streamed transaction as part of a two phase commit. +<programlisting> +typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx, + ReorderBufferTXN *txn, + XLogRecPtr abort_lsn); +</programlisting> + </para> + </sect3> + + <sect3 id="logicaldecoding-output-plugin-stream-abort-prepared"> + <title>Stream Abort Prepared Callback</title> + <para> + The <function>stream_abort_prepared_cb</function> callback is called to abort prepared + a previously streamed transaction as part of a two phase commit. +<programlisting> +typedef void (*LogicalDecodeStreamAbortPreparedCB) (struct LogicalDecodingContext *ctx, + ReorderBufferTXN *txn, + XLogRecPtr abort_lsn); +</programlisting> + </para> + </sect3>1. Everywhere it says "two phase" commit should be consistently
replaced to say "two-phase" commit (with the hyphen)2. Search for "abort_lsn" parameter. It seems to be overused
(cut/paste error) even when the API is unrelated to abort3. 48.6.4.17 and 48.6.4.18
Is this wording ok? Is the word "prepared" even necessary here?
- "... called to commit prepared a previously streamed transaction ..."
- "... called to abort prepared a previously streamed transaction ...";
Updated accordingly.
COMMENT Section 48.9 @@ -1017,9 +1061,13 @@ OutputPluginWrite(ctx, true); When streaming an in-progress transaction, the changes (and messages) are streamed in blocks demarcated by <function>stream_start_cb</function> and <function>stream_stop_cb</function> callbacks. Once all the decoded - changes are transmitted, the transaction is committed using the - <function>stream_commit_cb</function> callback (or possibly aborted using - the <function>stream_abort_cb</function> callback). + changes are transmitted, the transaction can be committed using the + the <function>stream_commit_cb</function> callback"two phase" --> "two-phase"
~
Also, Missing period on end of sentence.
"or aborted using the stream_abort_prepared_cb" --> "or aborted using
the stream_abort_prepared_cb.";
Updated accordingly.
==========
Patch V6-0004, File: src/backend/replication/logical/logical.c
==========COMMENT Line 84 @@ -81,6 +81,12 @@ static void stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, XLogRecPtr last_lsn); static void stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, XLogRecPtr abort_lsn); +static void stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, + XLogRecPtr commit_lsn); +static void stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, + XLogRecPtr commit_lsn); +static void stream_abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, + XLogRecPtr commit_lsn);The 3rd parameter is always "commit_lsn" even for API unrelated to
commit, so seems like cut/paste error.;
COMMENT
Line 1246
@@ -1231,6 +1243,105 @@ stream_abort_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
}static void +stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, + XLogRecPtr commit_lsn) +{ + LogicalDecodingContext *ctx = cache->private_data; + LogicalErrorCallbackState state;Misnamed parameter "commit_lsn" ?
~
Also, Line 1272
There seem to be some missing integrity checking to make sure the
callback is not NULL.
A null callback will give NPE when wrapper attempts to call it;
COMMENT Line 1305 +static void +stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,There seem to be some missing integrity checking to make sure the
callback is not NULL.
A null callback will give NPE when wrapper attempts to call it.;
COMMENT Line 1312 +static void +stream_abort_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,Misnamed parameter "commit_lsn" ?
~
Also, Line 1338
There seem to be some missing integrity checking to make sure the
callback is not NULL.
A null callback will give NPE when wrapper attempts to call it.
Updated accordingly.
==========
Patch V6-0004, File: src/backend/replication/logical/reorderbuffer.c
==========COMMENT
Line 2684
@@ -2672,15 +2681,31 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb,
TransactionId xid,
txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
strcpy(txn->gid, gid);- if (is_commit) + if (rbtxn_is_streamed(txn)) { - txn->txn_flags |= RBTXN_COMMIT_PREPARED; - rb->commit_prepared(rb, txn, commit_lsn); + if (is_commit) + { + txn->txn_flags |= RBTXN_COMMIT_PREPARED;The setting/checking of the flags could be refactored if you wanted to
write less code:
e.g.
if (is_commit)
txn->txn_flags |= RBTXN_COMMIT_PREPARED;
else
txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;if (rbtxn_is_streamed(txn) && rbtxn_commit_prepared(txn))
rb->stream_commit_prepared(rb, txn, commit_lsn);
else if (rbtxn_is_streamed(txn) && rbtxn_rollback_prepared(txn))
rb->stream_abort_prepared(rb, txn, commit_lsn);
else if (rbtxn_commit_prepared(txn))
rb->commit_prepared(rb, txn, commit_lsn);
else if (rbtxn_rollback_prepared(txn))
rb->abort_prepared(rb, txn, commit_lsn);;
Updated accordingly.
==========
Patch V6-0004, File: src/include/replication/output_plugin.h
==========COMMENT
Line 171
@@ -157,6 +157,33 @@ typedef void (*LogicalDecodeStreamAbortCB)
(struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);/* + * Called to prepare changes streamed to remote node from in-progress + * transaction. This is called as part of a two-phase commit and only when + * two-phased commits are supported + */1. Missing period all these comments.
2. Is the part that says "and only where two-phased commits are
supported" necessary to say? Is seems redundant since comments already
says called as part of a two-phase commit.;
==========
Patch V6-0004, File: src/include/replication/reorderbuffer.h
==========COMMENT
Line 467
@@ -466,6 +466,24 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);+/* prepare streamed transaction callback signature */ +typedef void (*ReorderBufferStreamPrepareCB) ( + ReorderBuffer *rb, + ReorderBufferTXN *txn, + XLogRecPtr commit_lsn); + +/* prepare streamed transaction callback signature */ +typedef void (*ReorderBufferStreamCommitPreparedCB) ( + ReorderBuffer *rb, + ReorderBufferTXN *txn, + XLogRecPtr commit_lsn); + +/* prepare streamed transaction callback signature */ +typedef void (*ReorderBufferStreamAbortPreparedCB) ( + ReorderBuffer *rb, + ReorderBufferTXN *txn, + XLogRecPtr commit_lsn);Cut/paste error - repeated same comment 3 times?
Updated Accordingly.
[END]
I believe I have addressed all of Peter's comments. Peter, do have a
look and let me know if I missed anything or if you find anythinge
else. Thanks for your comments, much appreciated.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v8-0001-Support-decoding-of-two-phase-transactions.patchapplication/octet-stream; name=v8-0001-Support-decoding-of-two-phase-transactions.patchDownload
From 4b19f77075bcccb526ac7aa622b2c37bbcd0f702 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 14 Oct 2020 01:24:51 -0400
Subject: [PATCH v8] Support decoding of two-phase transactions
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes documentation changes.
---
contrib/test_decoding/Makefile | 2 +-
contrib/test_decoding/expected/two_phase.out | 219 ++++++++++++++++++++
contrib/test_decoding/sql/two_phase.sql | 117 +++++++++++
contrib/test_decoding/test_decoding.c | 159 +++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 117 ++++++++++-
src/backend/replication/logical/decode.c | 250 +++++++++++++++++++++--
src/backend/replication/logical/logical.c | 184 +++++++++++++++++
src/backend/replication/logical/reorderbuffer.c | 258 ++++++++++++++++++++----
src/include/replication/logical.h | 5 +
src/include/replication/output_plugin.h | 37 ++++
src/include/replication/reorderbuffer.h | 74 +++++++
11 files changed, 1372 insertions(+), 50 deletions(-)
create mode 100644 contrib/test_decoding/expected/two_phase.out
create mode 100644 contrib/test_decoding/sql/two_phase.sql
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a4c76f..f0e4dbd 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -4,7 +4,7 @@ MODULES = test_decoding
PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
- decoding_into_rel binary prepared replorigin time messages \
+ decoding_into_rel binary prepared replorigin time two_phase messages \
spill slot truncate stream stats
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
diff --git a/contrib/test_decoding/expected/two_phase.out b/contrib/test_decoding/expected/two_phase.out
new file mode 100644
index 0000000..3ac01a4
--- /dev/null
+++ b/contrib/test_decoding/expected/two_phase.out
@@ -0,0 +1,219 @@
+-- Test two-phased transactions, when two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ table public.test_prepared1: INSERT: id[integer]:2
+ PREPARE TRANSACTION 'test_prepared#1'
+(4 rows)
+
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
+-- test abort of a prepared xact
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(3 rows)
+
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
+-- test prepared xact containing ddl
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(3 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:5
+ COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:6 data[text]:null
+ COMMIT
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
+ COMMIT
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+----------------+----------+---------------------
+ test_prepared1 | relation | RowExclusiveLock
+ test_prepared1 | relation | ShareLock
+ test_prepared1 | relation | AccessExclusiveLock
+(3 rows)
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+(4 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints and sub-xacts as a result
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------------------------------------------------------------
+ BEGIN
+ table public.test_prepared_savepoint: INSERT: a[integer]:1
+ PREPARE TRANSACTION 'test_prepared_savepoint'
+(3 rows)
+
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+SELECT pg_drop_replication_slot('regression_slot');
+ pg_drop_replication_slot
+--------------------------
+
+(1 row)
+
diff --git a/contrib/test_decoding/sql/two_phase.sql b/contrib/test_decoding/sql/two_phase.sql
new file mode 100644
index 0000000..e3e2690
--- /dev/null
+++ b/contrib/test_decoding/sql/two_phase.sql
@@ -0,0 +1,117 @@
+-- Test two-phased transactions, when two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test abort of a prepared xact
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+
+-- test prepared xact containing ddl
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints and sub-xacts as a result
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e60ab34..6b8e502 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid_aborted; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -88,6 +93,18 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
void
_PG_init(void)
@@ -116,6 +133,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->rollback_prepared_cb = pg_decode_rollback_prepared_txn;
}
@@ -127,6 +148,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
ListCell *option;
TestDecodingData *data;
bool enable_streaming = false;
+ bool enable_2pc = false;
data = palloc0(sizeof(TestDecodingData));
data->context = AllocSetContextCreate(ctx->context,
@@ -136,6 +158,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid_aborted = InvalidTransactionId;
ctx->output_plugin_private = data;
@@ -227,6 +250,35 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
+ else if (!parse_bool(strVal(elem->arg), &enable_2pc))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse value \"%s\" for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+ }
+ else if (strcmp(elem->defname, "check-xid-aborted") == 0)
+ {
+ if (elem->arg == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted needs an input value")));
+ else
+ {
+ errno = 0;
+ data->check_xid_aborted = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (!TransactionIdIsValid(data->check_xid_aborted))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted is not a valid xid: \"%s\"",
+ strVal(elem->arg))));
+ }
+ }
else
{
ereport(ERROR,
@@ -238,6 +290,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->twophase &= enable_2pc;
}
/* cleanup this plugin's resources */
@@ -297,6 +350,93 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +595,25 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid_aborted is a valid xid, then it was passed in
+ * as an option to check if the transaction having this xid would be aborted.
+ * This is to test concurrent aborts.
+ */
+ if (TransactionIdIsValid(data->check_xid_aborted))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid_aborted);
+ while (TransactionIdIsInProgress(data->check_xid_aborted))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid_aborted) &&
+ !TransactionIdDidCommit(data->check_xid_aborted))
+ elog(LOG, "%u aborted", data->check_xid_aborted);
+
+ Assert(TransactionIdDidAbort(data->check_xid_aborted));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..4ad5dca 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -387,6 +387,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
@@ -417,6 +421,13 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb);
are required, while <function>stream_message_cb</function> and
<function>stream_truncate_cb</function> are optional.
</para>
+
+ <para>
+ An output plugin may also define functions to support two-phase commits, which are
+ decoded on <command>PREPARE TRANSACTION</command>. The <function>prepare_cb</function>,
+ <function>commit_prepared_cb</function> and <function>abort_prepared_cb</function>
+ callbacks are required, while <function>filter_prepare_cb</function> is optional.
+ </para>
</sect2>
<sect2 id="logicaldecoding-capabilities">
@@ -477,7 +488,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +595,55 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Transaction Commit Prepared Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a transaction commit prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-rollback-prepared">
+ <title>Transaction Rollback Prepared Callback</title>
+
+ <para>
+ The optional <function>rollback_prepared_cb</function> callback is called whenever
+ a transaction rollback prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +653,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +736,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +790,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee9..e011fd9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -68,8 +68,15 @@ static void DecodeSpecConfirm(LogicalDecodingContext *ctx, XLogRecordBuffer *buf
static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
+static void DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -239,7 +246,6 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
switch (info)
{
case XLOG_XACT_COMMIT:
- case XLOG_XACT_COMMIT_PREPARED:
{
xl_xact_commit *xlrec;
xl_xact_parsed_commit parsed;
@@ -256,8 +262,24 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeCommit(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ xl_xact_commit *xlrec;
+ xl_xact_parsed_commit parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_commit *) XLogRecGetData(r);
+ ParseCommitRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeCommitPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ABORT:
- case XLOG_XACT_ABORT_PREPARED:
{
xl_xact_abort *xlrec;
xl_xact_parsed_abort parsed;
@@ -274,6 +296,23 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeAbort(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ xl_xact_abort *xlrec;
+ xl_xact_parsed_abort parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_abort *) XLogRecGetData(r);
+ ParseAbortRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeAbortPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ASSIGNMENT:
/*
@@ -312,17 +351,35 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* check that output plugin is capable of two-phase decoding */
+ if (!ctx->twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
+
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -659,6 +716,131 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Consolidated commit record handling between the different form of commit
+ * records.
+ */
+static void
+DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid)
+{
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = parsed->xact_time;
+ RepOriginId origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
+ parsed->nsubxacts, parsed->subxacts);
+
+ /* ----
+ * Check whether we are interested in this specific transaction, and tell
+ * the reorderbuffer to forget the content of the (sub-)transactions
+ * if not.
+ *
+ * There can be several reasons we might not be interested in this
+ * transaction:
+ * 1) We might not be interested in decoding transactions up to this
+ * LSN. This can happen because we previously decoded it and now just
+ * are restarting or if we haven't assembled a consistent snapshot yet.
+ * 2) The transaction happened in another database.
+ * 3) The output plugin is not interested in the origin.
+ * 4) We are doing fast-forwarding
+ *
+ * We can't just use ReorderBufferAbort() here, because we need to execute
+ * the transaction's invalidations. This currently won't be needed if
+ * we're just skipping over the transaction because currently we only do
+ * so during startup, to get to the first transaction the client needs. As
+ * we have reset the catalog caches before starting to read WAL, and we
+ * haven't yet touched any catalogs, there can't be anything to invalidate.
+ * But if we're "forgetting" this commit because it's it happened in
+ * another database, the invalidations might be important, because they
+ * could be for shared catalogs and we might have loaded data into the
+ * relevant syscaches.
+ * ---
+ */
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ {
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
+ }
+ ReorderBufferForget(ctx->reorder, xid, buf->origptr);
+
+ return;
+ }
+
+ /* tell the reorderbuffer about the surviving subtransactions */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /*
+ * For COMMIT PREPARED, the changes have already been replayed at
+ * PREPARE time, so we only need to notify the subscriber that the GID
+ * finally committed.
+ * If filter check present and this needs to be skipped, do a regular commit.
+ */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+ else
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
+}
+
+/*
* Get the data from the various forms of abort records and pass it on to
* snapbuild.c and reorderbuffer.c
*/
@@ -681,6 +863,50 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Get the data from the various forms of abort records and pass it on to
+ * snapbuild.c and reorderbuffer.c
+ */
+static void
+DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid)
+{
+ int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it passes through the filters handle the ROLLBACK via callbacks
+ */
+ if(!FilterByOrigin(ctx, origin_id) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ !ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ Assert(TransactionIdIsValid(xid));
+ Assert(parsed->dbId == ctx->slot->data.database);
+
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
+
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferAbort(ctx->reorder, parsed->subxacts[i],
+ buf->record->EndRecPtr);
+ }
+
+ ReorderBufferAbort(ctx->reorder, xid, buf->record->EndRecPtr);
+}
+
+/*
* Parse XLOG_HEAP_INSERT (not MULTI_INSERT!) records into tuplebufs.
*
* Deletes can contain the new tuple.
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 8675832..1c0f70d 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -59,6 +59,14 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -207,6 +215,10 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->rollback_prepared = rollback_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -227,6 +239,19 @@ StartupDecodingContext(List *output_plugin_options,
(ctx->callbacks.stream_truncate_cb != NULL);
/*
+ * To support two-phase logical decoding, we require prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however enable two phase logical
+ * decoding when at least one of the methods is enabled so that we can easily identify
+ * missing methods.
+ *
+ * We decide it here, but only check it later in the wrappers.
+ */
+ ctx->twophase = (ctx->callbacks.prepare_cb != NULL) ||
+ (ctx->callbacks.commit_prepared_cb != NULL) ||
+ (ctx->callbacks.rollback_prepared_cb != NULL) ||
+ (ctx->callbacks.filter_prepare_cb != NULL);
+
+ /*
* streaming callbacks
*
* stream_message and stream_truncate callbacks are optional, so we do not
@@ -783,6 +808,120 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin supports two-phase commits then prepare callback is mandatory */
+ if (ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then commit prepared callback is mandatory */
+ if (ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "rollback_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then abort prepared callback is mandatory */
+ if (ctx->callbacks.rollback_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register rollback_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.rollback_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -859,6 +998,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of two-phase at PREPARE time is not enabled. In that
+ * case all two-phase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 4cb27f2..8fc1301 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
static void ReorderBufferCleanupSerializedTXNs(const char *slotname);
static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
TransactionId xid, XLogSegNo segno);
@@ -418,6 +419,12 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
+
if (txn->tuplecid_hash != NULL)
{
hash_destroy(txn->tuplecid_hash);
@@ -1506,12 +1513,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either after streaming or
+ * after a PREPARE.
+ * The flag txn_prepared indicates if this is called after a PREPARE.
+ * If streaming, keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots. If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared)
{
dlist_mutable_iter iter;
@@ -1530,7 +1539,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
/* cleanup changes in the txn */
@@ -1564,9 +1573,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
+ {
+ /*
+ * If this is a prepared txn, cleanup the tuplecids we stored for decoding
+ * catalog snapshot access.
+ * They are always stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);
+ Assert(change->action == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* remove the change from its containing list */
+ dlist_delete(&change->node);
+
+ ReorderBufferReturnChange(rb, change, true);
+ }
+ }
+
/*
* Destroy the (relfilenode, ctid) hashtable, so that we don't leak any
* memory. We could also keep the hash table and update it with new ctid
@@ -1891,7 +1924,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
ReorderBufferChange *specinsert)
{
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Free all resources allocated for toast reconstruction */
ReorderBufferToastReset(rb, txn);
@@ -1998,7 +2031,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2289,7 +2322,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for two-phase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2330,11 +2372,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
+ {
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
else
ReorderBufferCleanupTXN(rb, txn);
}
@@ -2363,17 +2411,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
- * This error can only occur when we are sending the data in
- * streaming mode and the streaming is not finished yet.
+ * This error can only occur either when we are sending the data in
+ * streaming mode and the streaming is not finished yet or when we are
+ * sending the data out on a PREPARE during a two-phase commit.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2381,10 +2430,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream remaining data. */
+ if (streaming)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
}
else
{
@@ -2406,23 +2464,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2464,6 +2515,140 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a two-phase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ROLLBACK PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, two-phase transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ {
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ rb->commit_prepared(rb, txn, commit_lsn);
+ }
+ else
+ {
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+ rb->rollback_prepared(rb, txn, commit_lsn);
+ }
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(rb, txn);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2506,7 +2691,12 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
ReorderBufferCleanupTXN(rb, txn);
}
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 40bab7e..a191e82 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -84,6 +84,11 @@ typedef struct LogicalDecodingContext
*/
bool streaming;
+ /*
+ * Does the output plugin support two-phase decoding, and is it enabled?
+ */
+ bool twophase;
+
/*
* State for writing output.
*/
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..96acd01 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -77,6 +77,39 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/rollback_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -171,6 +204,10 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 0cc3aeb..ca823e3 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -166,6 +167,9 @@ typedef struct ReorderBufferChange
#define RBTXN_IS_STREAMED 0x0010
#define RBTXN_HAS_TOAST_INSERT 0x0020
#define RBTXN_HAS_SPEC_INSERT 0x0040
+#define RBTXN_PREPARE 0x0080
+#define RBTXN_COMMIT_PREPARED 0x0100
+#define RBTXN_ROLLBACK_PREPARED 0x0200
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -225,6 +229,24 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* Has this transaction been prepared? */
+#define rbtxn_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_PREPARE) != 0 \
+)
+
+/* Has this prepared transaction been committed? */
+#define rbtxn_commit_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_COMMIT_PREPARED) != 0 \
+)
+
+/* Has this prepared transaction been rollbacked? */
+#define rbtxn_rollback_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_ROLLBACK_PREPARED) != 0 \
+)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -236,6 +258,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -397,6 +422,36 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferRollbackPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -489,6 +544,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferRollbackPreparedCB rollback_prepared;
ReorderBufferMessageCB message;
/*
@@ -566,6 +626,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -589,6 +654,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
v8-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchapplication/octet-stream; name=v8-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchDownload
From 51b58178a02261f787790f93b0ce6e76c857f349 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 14 Oct 2020 01:26:00 -0400
Subject: [PATCH v8] Tap test to test concurrent aborts during 2 phase commits
This test is specifically for testing concurrent abort while logical decode
is ongoing. Pass in the xid of the 2PC to the plugin as an option.
On the receipt of a valid "check-xid", the change API in the test decoding
plugin will wait for it to be aborted.
---
contrib/test_decoding/Makefile | 2 +
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++++++++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index f0e4dbd..2abf3ce 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -9,6 +9,8 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..51c6a35
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid-aborted", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid-aborted" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid-aborted"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid-aborted', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
--
1.8.3.1
v8-0004-Support-two-phase-commits-in-streaming-mode-in-lo.patchapplication/octet-stream; name=v8-0004-Support-two-phase-commits-in-streaming-mode-in-lo.patchDownload
From cd7c441e0c07e3a1f8428fed3111ef3a5aa50f75 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 14 Oct 2020 07:48:50 -0400
Subject: [PATCH v8] Support two phase commits in streaming mode in logical
decoding
Add APIs to the streaming APIS for PREPARE, COMMIT PREPARED and ROLLBACK PREPARED
---
contrib/test_decoding/test_decoding.c | 84 +++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 62 +++++++++--
src/backend/replication/logical/logical.c | 132 +++++++++++++++++++++++-
src/backend/replication/logical/reorderbuffer.c | 29 ++++--
src/include/replication/output_plugin.h | 27 +++++
src/include/replication/reorderbuffer.h | 21 ++++
6 files changed, 339 insertions(+), 16 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 6b8e502..9f0cf10 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -78,6 +78,15 @@ static void pg_decode_stream_stop(LogicalDecodingContext *ctx,
static void pg_decode_stream_abort(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_stream_commit_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_stream_rollback_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
static void pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
@@ -129,6 +138,9 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_start_cb = pg_decode_stream_start;
cb->stream_stop_cb = pg_decode_stream_stop;
cb->stream_abort_cb = pg_decode_stream_abort;
+ cb->stream_prepare_cb = pg_decode_stream_prepare;
+ cb->stream_commit_prepared_cb = pg_decode_stream_commit_prepared;
+ cb->stream_rollback_prepared_cb = pg_decode_stream_rollback_prepared;
cb->stream_commit_cb = pg_decode_stream_commit;
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
@@ -805,6 +817,78 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}
static void
+pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "preparing streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "preparing streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
+pg_decode_stream_commit_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "commit prepared streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "commit prepared streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
+pg_decode_stream_rollback_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "abort prepared streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "abort prepared streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 4ad5dca..824c55a 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -396,6 +396,9 @@ typedef struct OutputPluginCallbacks
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamRollbackPreparedCB stream_rollback_prepared_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
@@ -418,14 +421,16 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb);
in-progress transactions. The <function>stream_start_cb</function>,
<function>stream_stop_cb</function>, <function>stream_abort_cb</function>,
<function>stream_commit_cb</function> and <function>stream_change_cb</function>
- are required, while <function>stream_message_cb</function> and
+ are required, while <function>stream_message_cb</function>,
+ <function>stream_prepare_cb</function>, <function>stream_commit_prepared_cb</function>,
+ <function>stream_rollback_prepared_cb</function> and
<function>stream_truncate_cb</function> are optional.
</para>
<para>
An output plugin may also define functions to support two-phase commits, which are
decoded on <command>PREPARE TRANSACTION</command>. The <function>prepare_cb</function>,
- <function>commit_prepared_cb</function> and <function>abort_prepared_cb</function>
+ <function>commit_prepared_cb</function> and <function>rollback_prepared_cb</function>
callbacks are required, while <function>filter_prepare_cb</function> is optional.
</para>
</sect2>
@@ -638,8 +643,8 @@ typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ct
callback.
<programlisting>
typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
- ReorderBufferTXN *txn,
- XLogRecPtr abort_lsn);
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
</programlisting>
</para>
</sect3>
@@ -846,6 +851,45 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-stream-prepare">
+ <title>Stream Prepare Callback</title>
+ <para>
+ The <function>stream_prepare_cb</function> callback is called to prepare
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-commit-prepared">
+ <title>Stream Commit Prepared Callback</title>
+ <para>
+ The <function>stream_commit_prepared_cb</function> callback is called to commit
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-rollback-prepared">
+ <title>Stream Rollback Prepared Callback</title>
+ <para>
+ The <function>stream_rollback_prepared_cb</function> callback is called to abort
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-stream-commit">
<title>Stream Commit Callback</title>
<para>
@@ -1024,9 +1068,13 @@ OutputPluginWrite(ctx, true);
When streaming an in-progress transaction, the changes (and messages) are
streamed in blocks demarcated by <function>stream_start_cb</function>
and <function>stream_stop_cb</function> callbacks. Once all the decoded
- changes are transmitted, the transaction is committed using the
- <function>stream_commit_cb</function> callback (or possibly aborted using
- the <function>stream_abort_cb</function> callback).
+ changes are transmitted, the transaction can be committed using the
+ the <function>stream_commit_cb</function> callback
+ (or possibly aborted using the <function>stream_abort_cb</function> callback).
+ If two-phase commits are supported, the transaction can be prepared using the
+ <function>stream_prepare_cb</function> callback, commit prepared using the
+ <function>stream_commit_prepared_cb</function> callback or aborted using the
+ <function>stream_rollback_prepared_cb</function>.
</para>
<para>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 1c0f70d..84a4751 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -82,6 +82,12 @@ static void stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr last_lsn);
static void stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
static void stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
static void stream_change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -233,6 +239,9 @@ StartupDecodingContext(List *output_plugin_options,
ctx->streaming = (ctx->callbacks.stream_start_cb != NULL) ||
(ctx->callbacks.stream_stop_cb != NULL) ||
(ctx->callbacks.stream_abort_cb != NULL) ||
+ (ctx->callbacks.stream_prepare_cb != NULL) ||
+ (ctx->callbacks.stream_commit_prepared_cb != NULL) ||
+ (ctx->callbacks.stream_rollback_prepared_cb != NULL) ||
(ctx->callbacks.stream_commit_cb != NULL) ||
(ctx->callbacks.stream_change_cb != NULL) ||
(ctx->callbacks.stream_message_cb != NULL) ||
@@ -262,6 +271,9 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->stream_start = stream_start_cb_wrapper;
ctx->reorder->stream_stop = stream_stop_cb_wrapper;
ctx->reorder->stream_abort = stream_abort_cb_wrapper;
+ ctx->reorder->stream_prepare = stream_prepare_cb_wrapper;
+ ctx->reorder->stream_commit_prepared = stream_commit_prepared_cb_wrapper;
+ ctx->reorder->stream_rollback_prepared = stream_rollback_prepared_cb_wrapper;
ctx->reorder->stream_commit = stream_commit_cb_wrapper;
ctx->reorder->stream_change = stream_change_cb_wrapper;
ctx->reorder->stream_message = stream_message_cb_wrapper;
@@ -885,7 +897,7 @@ commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
static void
rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr abort_lsn)
+ XLogRecPtr abort_lsn)
{
LogicalDecodingContext *ctx = cache->private_data;
LogicalErrorCallbackState state;
@@ -1241,6 +1253,124 @@ stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming and two-phase commits are supported. */
+ Assert(ctx->streaming);
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_prepare";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_prepare_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming commits requires a stream_prepare_cb callback")));
+
+ ctx->callbacks.stream_prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_commit_prepared";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_commit_prepared_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming requires a stream_commit_prepared_cb callback")));
+
+ ctx->callbacks.stream_commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+stream_rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_rollback_prepared";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_rollback_prepared_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming requires a stream_rollback_prepared_cb callback")));
+
+ ctx->callbacks.stream_rollback_prepared_cb(ctx, txn, rollback_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 8fc1301..d569d46 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1793,9 +1793,18 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn)
ReorderBufferStreamTXN(rb, txn);
- rb->stream_commit(rb, txn, txn->final_lsn);
-
- ReorderBufferCleanupTXN(rb, txn);
+ if (rbtxn_prepared(txn))
+ {
+ rb->stream_prepare(rb, txn, txn->final_lsn);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
+ else
+ {
+ rb->stream_commit(rb, txn, txn->final_lsn);
+ ReorderBufferCleanupTXN(rb, txn);
+ }
}
/*
@@ -2633,15 +2642,19 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
strcpy(txn->gid, gid);
if (is_commit)
- {
txn->txn_flags |= RBTXN_COMMIT_PREPARED;
- rb->commit_prepared(rb, txn, commit_lsn);
- }
else
- {
txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+
+ if (rbtxn_is_streamed(txn) && rbtxn_commit_prepared(txn))
+ rb->stream_commit_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_is_streamed(txn) && rbtxn_rollback_prepared(txn))
+ rb->stream_rollback_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_commit_prepared(txn))
+ rb->commit_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_rollback_prepared(txn))
rb->rollback_prepared(rb, txn, commit_lsn);
- }
+
/* cleanup: make sure there's no cache pollution */
ReorderBufferExecuteInvalidations(rb, txn);
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 96acd01..dfcb577 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -157,6 +157,30 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);
/*
+ * Called to prepare changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called to commit prepared changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called to abort/rollback prepared changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+
+/*
* Called to apply changes streamed to remote node from in-progress
* transaction.
*/
@@ -214,6 +238,9 @@ typedef struct OutputPluginCallbacks
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamRollbackPreparedCB stream_rollback_prepared_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index ca823e3..84e4840 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -478,6 +478,24 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamRollbackPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+
/* commit streamed transaction callback signature */
typedef void (*ReorderBufferStreamCommitCB) (
ReorderBuffer *rb,
@@ -557,6 +575,9 @@ struct ReorderBuffer
ReorderBufferStreamStartCB stream_start;
ReorderBufferStreamStopCB stream_stop;
ReorderBufferStreamAbortCB stream_abort;
+ ReorderBufferStreamPrepareCB stream_prepare;
+ ReorderBufferStreamCommitPreparedCB stream_commit_prepared;
+ ReorderBufferStreamRollbackPreparedCB stream_rollback_prepared;
ReorderBufferStreamCommitCB stream_commit;
ReorderBufferStreamChangeCB stream_change;
ReorderBufferStreamMessageCB stream_message;
--
1.8.3.1
v8-0003-pgoutput-output-plugin-support-for-logical-decodi.patchapplication/octet-stream; name=v8-0003-pgoutput-output-plugin-support-for-logical-decodi.patchDownload
From de50d2567bb7c8e593224875d2ed4b7f648e0b46 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 14 Oct 2020 06:04:59 -0400
Subject: [PATCH v8] pgoutput output plugin support for logical decoding of 2pc
Support decoding of two phase commit in pgoutput and on subscriber side.
---
src/backend/access/transam/twophase.c | 27 +++++
src/backend/replication/logical/proto.c | 74 ++++++++++++-
src/backend/replication/logical/worker.c | 153 +++++++++++++++++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 51 +++++++++
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 27 +++++
src/test/subscription/t/020_twophase.pl | 163 ++++++++++++++++++++++++++++
7 files changed, 494 insertions(+), 2 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060..36ba21f 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+ bool found = false;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (gxact->valid && strcmp(gxact->gid, gid) == 0)
+ {
+ found = true;
+ break;
+ }
+
+ }
+ LWLockRelease(TwoPhaseStateLock);
+ return found;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..1bd3987 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -78,7 +78,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, 'C'); /* sending COMMIT */
- /* send the flags field (unused for now) */
+ /* send the flags field */
pq_sendbyte(out, flags);
/* send fields */
@@ -99,6 +99,7 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
if (flags != 0)
elog(ERROR, "unrecognized flags %u in commit message", flags);
+
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
commit_data->end_lsn = pq_getmsgint64(in);
@@ -106,6 +107,77 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for two-phase commit transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(strlen(txn->gid) > 0);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags |= LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags |= LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 8d5d9e0..b417b16 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -722,6 +722,7 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_timestamp = commit_data.committime;
CommitTransactionCommand();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -742,6 +743,152 @@ apply_handle_commit(StringInfo s)
}
/*
+ * Called from apply_handle_prepare to handle a PREPARE TRANSACTION.
+ */
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a COMMIT PREPARED of a previously
+ * PREPARED transaction.
+ */
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a ROLLBACK PREPARED of a previously
+ * PREPARED TRANSACTION.
+ */
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
+/*
* Handle ORIGIN message.
*
* TODO, support tracking of multiple origins
@@ -1913,10 +2060,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 9c997ae..4078cab 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -143,6 +149,9 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->rollback_prepared_cb = pgoutput_rollback_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -378,6 +387,48 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
* Write the current schema of the relation and its ancestor (if any) if not
* done yet.
*/
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0c2cda2..2ca6d74 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -87,6 +87,7 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
XLogRecPtr commit_lsn;
@@ -94,6 +95,28 @@ typedef struct LogicalRepCommitData
TimestampTz committime;
} LogicalRepCommitData;
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ROLLBACK] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ (((flags) == LOGICALREP_IS_PREPARE) || \
+ ((flags) == LOGICALREP_IS_COMMIT_PREPARED) || \
+ ((flags) == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
@@ -101,6 +124,10 @@ extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..3feb2c3
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,163 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+ ));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf(
+ 'postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+"ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2"
+);
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (11);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 11;");
+ is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is committed on subscriber');
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (12);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 12;");
+ is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is aborted on subscriber');
+
+# Check that commit prepared is decoded properly on crash restart
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a IN (12,13);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
On Wed, Oct 14, 2020 at 6:15 PM Ajin Cherian <itsajin@gmail.com> wrote:
I think it will be easier to review this work if we can split the
patches according to the changes made in different layers. The first
patch could be changes made in output plugin and the corresponding
changes in test_decoding, see the similar commit of in-progress
transactions [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=45fdc9738b36d1068d3ad8fdb06436d6fd14436b. So you need to move corresponding changes from
v8-0001-Support-decoding-of-two-phase-transactions and
v8-0004-Support-two-phase-commits-in-streaming-mode-in-lo for this.
The second patch could be changes made in ReorderBuffer to support
this feature, see [2]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7259736a6e5b7c7588fff9578370736a6648acbb. The third patch could be changes made to
support pgoutput and subscriber-side stuff, see [3]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=464824323e57dc4b397e8b05854d779908b55304. What do you
think?
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=45fdc9738b36d1068d3ad8fdb06436d6fd14436b
[2]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7259736a6e5b7c7588fff9578370736a6648acbb
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=464824323e57dc4b397e8b05854d779908b55304
--
With Regards,
Amit Kapila.
On Thu, Oct 15, 2020 at 2:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Oct 14, 2020 at 6:15 PM Ajin Cherian <itsajin@gmail.com> wrote:
I think it will be easier to review this work if we can split the
patches according to the changes made in different layers. The first
patch could be changes made in output plugin and the corresponding
changes in test_decoding, see the similar commit of in-progress
transactions [1]. So you need to move corresponding changes from
v8-0001-Support-decoding-of-two-phase-transactions and
v8-0004-Support-two-phase-commits-in-streaming-mode-in-lo for this.
The second patch could be changes made in ReorderBuffer to support
this feature, see [2]. The third patch could be changes made to
support pgoutput and subscriber-side stuff, see [3]. What do you
think?
I agree. I have split the patches accordingly. Do have a look.
Pending work is:
1. Add pgoutput support for the new streaming two-phase commit APIs
2. Add test cases for two-phase commits with streaming for pub/sub and
test_decoding
3. Add CREATE SUBSCRIPTION command option to specify two-phase commits
rather than having it turned on by default.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v9-0001-Support-decoding-of-two-phase-transactions.patchapplication/octet-stream; name=v9-0001-Support-decoding-of-two-phase-transactions.patchDownload
From b09f3bfb86aab07dfd378c2f3615a5a8c2adea91 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Thu, 15 Oct 2020 02:19:59 -0400
Subject: [PATCH v9] Support decoding of two-phase transactions
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes documentation changes.
---
contrib/test_decoding/test_decoding.c | 243 +++++++++++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 173 +++++++++++++++-
src/backend/replication/logical/logical.c | 314 ++++++++++++++++++++++++++++++
src/include/replication/logical.h | 5 +
src/include/replication/output_plugin.h | 64 ++++++
src/include/replication/reorderbuffer.h | 95 +++++++++
6 files changed, 887 insertions(+), 7 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e60ab34..9f0cf10 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid_aborted; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -73,6 +78,15 @@ static void pg_decode_stream_stop(LogicalDecodingContext *ctx,
static void pg_decode_stream_abort(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_stream_commit_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_stream_rollback_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
static void pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
@@ -88,6 +102,18 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
void
_PG_init(void)
@@ -112,10 +138,17 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_start_cb = pg_decode_stream_start;
cb->stream_stop_cb = pg_decode_stream_stop;
cb->stream_abort_cb = pg_decode_stream_abort;
+ cb->stream_prepare_cb = pg_decode_stream_prepare;
+ cb->stream_commit_prepared_cb = pg_decode_stream_commit_prepared;
+ cb->stream_rollback_prepared_cb = pg_decode_stream_rollback_prepared;
cb->stream_commit_cb = pg_decode_stream_commit;
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->rollback_prepared_cb = pg_decode_rollback_prepared_txn;
}
@@ -127,6 +160,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
ListCell *option;
TestDecodingData *data;
bool enable_streaming = false;
+ bool enable_2pc = false;
data = palloc0(sizeof(TestDecodingData));
data->context = AllocSetContextCreate(ctx->context,
@@ -136,6 +170,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid_aborted = InvalidTransactionId;
ctx->output_plugin_private = data;
@@ -227,6 +262,35 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
+ else if (!parse_bool(strVal(elem->arg), &enable_2pc))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse value \"%s\" for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+ }
+ else if (strcmp(elem->defname, "check-xid-aborted") == 0)
+ {
+ if (elem->arg == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted needs an input value")));
+ else
+ {
+ errno = 0;
+ data->check_xid_aborted = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (!TransactionIdIsValid(data->check_xid_aborted))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted is not a valid xid: \"%s\"",
+ strVal(elem->arg))));
+ }
+ }
else
{
ereport(ERROR,
@@ -238,6 +302,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->twophase &= enable_2pc;
}
/* cleanup this plugin's resources */
@@ -297,6 +362,93 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +607,25 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid_aborted is a valid xid, then it was passed in
+ * as an option to check if the transaction having this xid would be aborted.
+ * This is to test concurrent aborts.
+ */
+ if (TransactionIdIsValid(data->check_xid_aborted))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid_aborted);
+ while (TransactionIdIsInProgress(data->check_xid_aborted))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid_aborted) &&
+ !TransactionIdDidCommit(data->check_xid_aborted))
+ elog(LOG, "%u aborted", data->check_xid_aborted);
+
+ Assert(TransactionIdDidAbort(data->check_xid_aborted));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
@@ -646,6 +817,78 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}
static void
+pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "preparing streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "preparing streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
+pg_decode_stream_commit_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "commit prepared streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "commit prepared streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
+pg_decode_stream_rollback_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "abort prepared streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "abort prepared streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..824c55a 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -387,11 +387,18 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamRollbackPreparedCB stream_rollback_prepared_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
@@ -414,9 +421,18 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb);
in-progress transactions. The <function>stream_start_cb</function>,
<function>stream_stop_cb</function>, <function>stream_abort_cb</function>,
<function>stream_commit_cb</function> and <function>stream_change_cb</function>
- are required, while <function>stream_message_cb</function> and
+ are required, while <function>stream_message_cb</function>,
+ <function>stream_prepare_cb</function>, <function>stream_commit_prepared_cb</function>,
+ <function>stream_rollback_prepared_cb</function> and
<function>stream_truncate_cb</function> are optional.
</para>
+
+ <para>
+ An output plugin may also define functions to support two-phase commits, which are
+ decoded on <command>PREPARE TRANSACTION</command>. The <function>prepare_cb</function>,
+ <function>commit_prepared_cb</function> and <function>rollback_prepared_cb</function>
+ callbacks are required, while <function>filter_prepare_cb</function> is optional.
+ </para>
</sect2>
<sect2 id="logicaldecoding-capabilities">
@@ -477,7 +493,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +600,55 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Transaction Commit Prepared Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a transaction commit prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-rollback-prepared">
+ <title>Transaction Rollback Prepared Callback</title>
+
+ <para>
+ The optional <function>rollback_prepared_cb</function> callback is called whenever
+ a transaction rollback prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +658,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +741,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +795,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
@@ -735,6 +851,45 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-stream-prepare">
+ <title>Stream Prepare Callback</title>
+ <para>
+ The <function>stream_prepare_cb</function> callback is called to prepare
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-commit-prepared">
+ <title>Stream Commit Prepared Callback</title>
+ <para>
+ The <function>stream_commit_prepared_cb</function> callback is called to commit
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-rollback-prepared">
+ <title>Stream Rollback Prepared Callback</title>
+ <para>
+ The <function>stream_rollback_prepared_cb</function> callback is called to abort
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-stream-commit">
<title>Stream Commit Callback</title>
<para>
@@ -913,9 +1068,13 @@ OutputPluginWrite(ctx, true);
When streaming an in-progress transaction, the changes (and messages) are
streamed in blocks demarcated by <function>stream_start_cb</function>
and <function>stream_stop_cb</function> callbacks. Once all the decoded
- changes are transmitted, the transaction is committed using the
- <function>stream_commit_cb</function> callback (or possibly aborted using
- the <function>stream_abort_cb</function> callback).
+ changes are transmitted, the transaction can be committed using the
+ the <function>stream_commit_cb</function> callback
+ (or possibly aborted using the <function>stream_abort_cb</function> callback).
+ If two-phase commits are supported, the transaction can be prepared using the
+ <function>stream_prepare_cb</function> callback, commit prepared using the
+ <function>stream_commit_prepared_cb</function> callback or aborted using the
+ <function>stream_rollback_prepared_cb</function>.
</para>
<para>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 8675832..84a4751 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -59,6 +59,14 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -74,6 +82,12 @@ static void stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr last_lsn);
static void stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
static void stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
static void stream_change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -207,6 +221,10 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->rollback_prepared = rollback_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -221,12 +239,28 @@ StartupDecodingContext(List *output_plugin_options,
ctx->streaming = (ctx->callbacks.stream_start_cb != NULL) ||
(ctx->callbacks.stream_stop_cb != NULL) ||
(ctx->callbacks.stream_abort_cb != NULL) ||
+ (ctx->callbacks.stream_prepare_cb != NULL) ||
+ (ctx->callbacks.stream_commit_prepared_cb != NULL) ||
+ (ctx->callbacks.stream_rollback_prepared_cb != NULL) ||
(ctx->callbacks.stream_commit_cb != NULL) ||
(ctx->callbacks.stream_change_cb != NULL) ||
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);
/*
+ * To support two-phase logical decoding, we require prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however enable two phase logical
+ * decoding when at least one of the methods is enabled so that we can easily identify
+ * missing methods.
+ *
+ * We decide it here, but only check it later in the wrappers.
+ */
+ ctx->twophase = (ctx->callbacks.prepare_cb != NULL) ||
+ (ctx->callbacks.commit_prepared_cb != NULL) ||
+ (ctx->callbacks.rollback_prepared_cb != NULL) ||
+ (ctx->callbacks.filter_prepare_cb != NULL);
+
+ /*
* streaming callbacks
*
* stream_message and stream_truncate callbacks are optional, so we do not
@@ -237,6 +271,9 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->stream_start = stream_start_cb_wrapper;
ctx->reorder->stream_stop = stream_stop_cb_wrapper;
ctx->reorder->stream_abort = stream_abort_cb_wrapper;
+ ctx->reorder->stream_prepare = stream_prepare_cb_wrapper;
+ ctx->reorder->stream_commit_prepared = stream_commit_prepared_cb_wrapper;
+ ctx->reorder->stream_rollback_prepared = stream_rollback_prepared_cb_wrapper;
ctx->reorder->stream_commit = stream_commit_cb_wrapper;
ctx->reorder->stream_change = stream_change_cb_wrapper;
ctx->reorder->stream_message = stream_message_cb_wrapper;
@@ -783,6 +820,120 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin supports two-phase commits then prepare callback is mandatory */
+ if (ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then commit prepared callback is mandatory */
+ if (ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "rollback_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then abort prepared callback is mandatory */
+ if (ctx->callbacks.rollback_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register rollback_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.rollback_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -859,6 +1010,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of two-phase at PREPARE time is not enabled. In that
+ * case all two-phase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
@@ -1057,6 +1253,124 @@ stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming and two-phase commits are supported. */
+ Assert(ctx->streaming);
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_prepare";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_prepare_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming commits requires a stream_prepare_cb callback")));
+
+ ctx->callbacks.stream_prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_commit_prepared";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_commit_prepared_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming requires a stream_commit_prepared_cb callback")));
+
+ ctx->callbacks.stream_commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+stream_rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_rollback_prepared";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_rollback_prepared_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming requires a stream_rollback_prepared_cb callback")));
+
+ ctx->callbacks.stream_rollback_prepared_cb(ctx, txn, rollback_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 40bab7e..a191e82 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -84,6 +84,11 @@ typedef struct LogicalDecodingContext
*/
bool streaming;
+ /*
+ * Does the output plugin support two-phase decoding, and is it enabled?
+ */
+ bool twophase;
+
/*
* State for writing output.
*/
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..dfcb577 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -77,6 +77,39 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/rollback_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -124,6 +157,30 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);
/*
+ * Called to prepare changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called to commit prepared changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called to abort/rollback prepared changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+
+/*
* Called to apply changes streamed to remote node from in-progress
* transaction.
*/
@@ -171,12 +228,19 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamRollbackPreparedCB stream_rollback_prepared_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1c77819..6cb4cb4 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -174,6 +175,9 @@ typedef struct ReorderBufferChange
#define RBTXN_IS_STREAMED 0x0010
#define RBTXN_HAS_TOAST_INSERT 0x0020
#define RBTXN_HAS_SPEC_INSERT 0x0040
+#define RBTXN_PREPARE 0x0080
+#define RBTXN_COMMIT_PREPARED 0x0100
+#define RBTXN_ROLLBACK_PREPARED 0x0200
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -233,6 +237,24 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* Has this transaction been prepared? */
+#define rbtxn_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_PREPARE) != 0 \
+)
+
+/* Has this prepared transaction been committed? */
+#define rbtxn_commit_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_COMMIT_PREPARED) != 0 \
+)
+
+/* Has this prepared transaction been rollbacked? */
+#define rbtxn_rollback_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_ROLLBACK_PREPARED) != 0 \
+)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -244,6 +266,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -405,6 +430,36 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferRollbackPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -431,6 +486,24 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamRollbackPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+
/* commit streamed transaction callback signature */
typedef void (*ReorderBufferStreamCommitCB) (
ReorderBuffer *rb,
@@ -497,6 +570,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferRollbackPreparedCB rollback_prepared;
ReorderBufferMessageCB message;
/*
@@ -505,6 +583,9 @@ struct ReorderBuffer
ReorderBufferStreamStartCB stream_start;
ReorderBufferStreamStopCB stream_stop;
ReorderBufferStreamAbortCB stream_abort;
+ ReorderBufferStreamPrepareCB stream_prepare;
+ ReorderBufferStreamCommitPreparedCB stream_commit_prepared;
+ ReorderBufferStreamRollbackPreparedCB stream_rollback_prepared;
ReorderBufferStreamCommitCB stream_commit;
ReorderBufferStreamChangeCB stream_change;
ReorderBufferStreamMessageCB stream_message;
@@ -574,6 +655,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -597,6 +683,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
v9-0003-pgoutput-plugin-support-for-logical-decoding-of-t.patchapplication/octet-stream; name=v9-0003-pgoutput-plugin-support-for-logical-decoding-of-t.patchDownload
From baf959a60c76ea3ba5c29c8b2b7d597e31b5fdd9 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Thu, 15 Oct 2020 06:20:56 -0400
Subject: [PATCH v9] pgoutput plugin support for logical decoding of two-phase
commits
Support decoding of two phase commit in pgoutput and on subscriber side.
---
src/backend/access/transam/twophase.c | 27 +++++
src/backend/replication/logical/proto.c | 74 ++++++++++++-
src/backend/replication/logical/worker.c | 153 +++++++++++++++++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 51 +++++++++
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 27 +++++
src/test/subscription/t/020_twophase.pl | 163 ++++++++++++++++++++++++++++
7 files changed, 494 insertions(+), 2 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060..2e0a408 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+ bool found = false;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (gxact->valid && strcmp(gxact->gid, gid) == 0)
+ {
+ found = true;
+ break;
+ }
+
+ }
+ LWLockRelease(TwoPhaseStateLock);
+ return found;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..1bd3987 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -78,7 +78,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, 'C'); /* sending COMMIT */
- /* send the flags field (unused for now) */
+ /* send the flags field */
pq_sendbyte(out, flags);
/* send fields */
@@ -99,6 +99,7 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
if (flags != 0)
elog(ERROR, "unrecognized flags %u in commit message", flags);
+
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
commit_data->end_lsn = pq_getmsgint64(in);
@@ -106,6 +107,77 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for two-phase commit transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(strlen(txn->gid) > 0);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags |= LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags |= LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 640409b..a02424b 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -722,6 +722,7 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_timestamp = commit_data.committime;
CommitTransactionCommand();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -742,6 +743,152 @@ apply_handle_commit(StringInfo s)
}
/*
+ * Called from apply_handle_prepare to handle a PREPARE TRANSACTION.
+ */
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a COMMIT PREPARED of a previously
+ * PREPARED transaction.
+ */
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a ROLLBACK PREPARED of a previously
+ * PREPARED TRANSACTION.
+ */
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
+/*
* Handle ORIGIN message.
*
* TODO, support tracking of multiple origins
@@ -1908,10 +2055,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 9c997ae..4078cab 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -143,6 +149,9 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->rollback_prepared_cb = pgoutput_rollback_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -378,6 +387,48 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
* Write the current schema of the relation and its ancestor (if any) if not
* done yet.
*/
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0c2cda2..2ca6d74 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -87,6 +87,7 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
XLogRecPtr commit_lsn;
@@ -94,6 +95,28 @@ typedef struct LogicalRepCommitData
TimestampTz committime;
} LogicalRepCommitData;
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ROLLBACK] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_PREPARE) || \
+ (flags == LOGICALREP_IS_COMMIT_PREPARED) || \
+ (flags == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
@@ -101,6 +124,10 @@ extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..3feb2c3
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,163 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+ ));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf(
+ 'postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+"ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2"
+);
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (11);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 11;");
+ is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is committed on subscriber');
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (12);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 12;");
+ is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is aborted on subscriber');
+
+# Check that commit prepared is decoded properly on crash restart
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a IN (12,13);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
v9-0002-Backend-support-for-logical-decoding-of-two-phase.patchapplication/octet-stream; name=v9-0002-Backend-support-for-logical-decoding-of-two-phase.patchDownload
From f644b9bcdc29fb77e3869b873c2905f0d8c795a9 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Thu, 15 Oct 2020 06:09:30 -0400
Subject: [PATCH v9] Backend support for logical decoding of two-phase commits
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
---
contrib/test_decoding/Makefile | 4 +-
contrib/test_decoding/expected/two_phase.out | 219 +++++++++++++++++++
contrib/test_decoding/sql/two_phase.sql | 117 ++++++++++
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++
src/backend/replication/logical/decode.c | 250 ++++++++++++++++++++-
src/backend/replication/logical/reorderbuffer.c | 278 ++++++++++++++++++++----
6 files changed, 937 insertions(+), 50 deletions(-)
create mode 100644 contrib/test_decoding/expected/two_phase.out
create mode 100644 contrib/test_decoding/sql/two_phase.sql
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a4c76f..2abf3ce 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -4,11 +4,13 @@ MODULES = test_decoding
PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
- decoding_into_rel binary prepared replorigin time messages \
+ decoding_into_rel binary prepared replorigin time two_phase messages \
spill slot truncate stream stats
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/two_phase.out b/contrib/test_decoding/expected/two_phase.out
new file mode 100644
index 0000000..3ac01a4
--- /dev/null
+++ b/contrib/test_decoding/expected/two_phase.out
@@ -0,0 +1,219 @@
+-- Test two-phased transactions, when two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ table public.test_prepared1: INSERT: id[integer]:2
+ PREPARE TRANSACTION 'test_prepared#1'
+(4 rows)
+
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
+-- test abort of a prepared xact
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(3 rows)
+
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
+-- test prepared xact containing ddl
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(3 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:5
+ COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:6 data[text]:null
+ COMMIT
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
+ COMMIT
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+----------------+----------+---------------------
+ test_prepared1 | relation | RowExclusiveLock
+ test_prepared1 | relation | ShareLock
+ test_prepared1 | relation | AccessExclusiveLock
+(3 rows)
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+(4 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints and sub-xacts as a result
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------------------------------------------------------------
+ BEGIN
+ table public.test_prepared_savepoint: INSERT: a[integer]:1
+ PREPARE TRANSACTION 'test_prepared_savepoint'
+(3 rows)
+
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+SELECT pg_drop_replication_slot('regression_slot');
+ pg_drop_replication_slot
+--------------------------
+
+(1 row)
+
diff --git a/contrib/test_decoding/sql/two_phase.sql b/contrib/test_decoding/sql/two_phase.sql
new file mode 100644
index 0000000..e3e2690
--- /dev/null
+++ b/contrib/test_decoding/sql/two_phase.sql
@@ -0,0 +1,117 @@
+-- Test two-phased transactions, when two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test abort of a prepared xact
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+
+-- test prepared xact containing ddl
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints and sub-xacts as a result
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..51c6a35
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid-aborted", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid-aborted" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid-aborted"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid-aborted', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee9..e011fd9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -68,8 +68,15 @@ static void DecodeSpecConfirm(LogicalDecodingContext *ctx, XLogRecordBuffer *buf
static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
+static void DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -239,7 +246,6 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
switch (info)
{
case XLOG_XACT_COMMIT:
- case XLOG_XACT_COMMIT_PREPARED:
{
xl_xact_commit *xlrec;
xl_xact_parsed_commit parsed;
@@ -256,8 +262,24 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeCommit(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ xl_xact_commit *xlrec;
+ xl_xact_parsed_commit parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_commit *) XLogRecGetData(r);
+ ParseCommitRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeCommitPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ABORT:
- case XLOG_XACT_ABORT_PREPARED:
{
xl_xact_abort *xlrec;
xl_xact_parsed_abort parsed;
@@ -274,6 +296,23 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeAbort(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ xl_xact_abort *xlrec;
+ xl_xact_parsed_abort parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_abort *) XLogRecGetData(r);
+ ParseAbortRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeAbortPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ASSIGNMENT:
/*
@@ -312,17 +351,35 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* check that output plugin is capable of two-phase decoding */
+ if (!ctx->twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
+
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -659,6 +716,131 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Consolidated commit record handling between the different form of commit
+ * records.
+ */
+static void
+DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid)
+{
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = parsed->xact_time;
+ RepOriginId origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
+ parsed->nsubxacts, parsed->subxacts);
+
+ /* ----
+ * Check whether we are interested in this specific transaction, and tell
+ * the reorderbuffer to forget the content of the (sub-)transactions
+ * if not.
+ *
+ * There can be several reasons we might not be interested in this
+ * transaction:
+ * 1) We might not be interested in decoding transactions up to this
+ * LSN. This can happen because we previously decoded it and now just
+ * are restarting or if we haven't assembled a consistent snapshot yet.
+ * 2) The transaction happened in another database.
+ * 3) The output plugin is not interested in the origin.
+ * 4) We are doing fast-forwarding
+ *
+ * We can't just use ReorderBufferAbort() here, because we need to execute
+ * the transaction's invalidations. This currently won't be needed if
+ * we're just skipping over the transaction because currently we only do
+ * so during startup, to get to the first transaction the client needs. As
+ * we have reset the catalog caches before starting to read WAL, and we
+ * haven't yet touched any catalogs, there can't be anything to invalidate.
+ * But if we're "forgetting" this commit because it's it happened in
+ * another database, the invalidations might be important, because they
+ * could be for shared catalogs and we might have loaded data into the
+ * relevant syscaches.
+ * ---
+ */
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ {
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
+ }
+ ReorderBufferForget(ctx->reorder, xid, buf->origptr);
+
+ return;
+ }
+
+ /* tell the reorderbuffer about the surviving subtransactions */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /*
+ * For COMMIT PREPARED, the changes have already been replayed at
+ * PREPARE time, so we only need to notify the subscriber that the GID
+ * finally committed.
+ * If filter check present and this needs to be skipped, do a regular commit.
+ */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+ else
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
+}
+
+/*
* Get the data from the various forms of abort records and pass it on to
* snapbuild.c and reorderbuffer.c
*/
@@ -681,6 +863,50 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Get the data from the various forms of abort records and pass it on to
+ * snapbuild.c and reorderbuffer.c
+ */
+static void
+DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid)
+{
+ int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it passes through the filters handle the ROLLBACK via callbacks
+ */
+ if(!FilterByOrigin(ctx, origin_id) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ !ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ Assert(TransactionIdIsValid(xid));
+ Assert(parsed->dbId == ctx->slot->data.database);
+
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
+
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferAbort(ctx->reorder, parsed->subxacts[i],
+ buf->record->EndRecPtr);
+ }
+
+ ReorderBufferAbort(ctx->reorder, xid, buf->record->EndRecPtr);
+}
+
+/*
* Parse XLOG_HEAP_INSERT (not MULTI_INSERT!) records into tuplebufs.
*
* Deletes can contain the new tuple.
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7a8bf76..9df7ff7 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
static void ReorderBufferCleanupSerializedTXNs(const char *slotname);
static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
TransactionId xid, XLogSegNo segno);
@@ -418,6 +419,12 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
+
if (txn->tuplecid_hash != NULL)
{
hash_destroy(txn->tuplecid_hash);
@@ -1511,12 +1518,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either after streaming or
+ * after a PREPARE.
+ * The flag txn_prepared indicates if this is called after a PREPARE.
+ * If streaming, keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots. If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared)
{
dlist_mutable_iter iter;
@@ -1535,7 +1544,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
/* cleanup changes in the txn */
@@ -1569,9 +1578,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
+ {
+ /*
+ * If this is a prepared txn, cleanup the tuplecids we stored for decoding
+ * catalog snapshot access.
+ * They are always stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);
+ Assert(change->action == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* Remove the change from its containing list. */
+ dlist_delete(&change->node);
+
+ ReorderBufferReturnChange(rb, change, true);
+ }
+ }
+
/*
* Destroy the (relfilenode, ctid) hashtable, so that we don't leak any
* memory. We could also keep the hash table and update it with new ctid
@@ -1765,9 +1798,18 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn)
ReorderBufferStreamTXN(rb, txn);
- rb->stream_commit(rb, txn, txn->final_lsn);
-
- ReorderBufferCleanupTXN(rb, txn);
+ if (rbtxn_prepared(txn))
+ {
+ rb->stream_prepare(rb, txn, txn->final_lsn);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
+ else
+ {
+ rb->stream_commit(rb, txn, txn->final_lsn);
+ ReorderBufferCleanupTXN(rb, txn);
+ }
}
/*
@@ -1896,7 +1938,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
ReorderBufferChange *specinsert)
{
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Free all resources allocated for toast reconstruction */
ReorderBufferToastReset(rb, txn);
@@ -2003,7 +2045,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2294,7 +2336,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for two-phase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2335,11 +2386,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
+ {
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
else
ReorderBufferCleanupTXN(rb, txn);
}
@@ -2369,17 +2426,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
- * This error can only occur when we are sending the data in
- * streaming mode and the streaming is not finished yet.
+ * This error can only occur either when we are sending the data in
+ * streaming mode and the streaming is not finished yet or when we are
+ * sending the data out on a PREPARE during a two-phase commit.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2387,10 +2445,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream remaining data. */
+ if (streaming)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
}
else
{
@@ -2412,23 +2479,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2470,6 +2530,145 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a two-phase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ROLLBACK PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, two-phase transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ else
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+
+ if (rbtxn_is_streamed(txn) && rbtxn_commit_prepared(txn))
+ rb->stream_commit_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_is_streamed(txn) && rbtxn_rollback_prepared(txn))
+ rb->stream_rollback_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_commit_prepared(txn))
+ rb->commit_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_rollback_prepared(txn))
+ rb->rollback_prepared(rb, txn, commit_lsn);
+
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(txn->ninvalidations,
+ txn->invalidations);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2512,7 +2711,12 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
ReorderBufferCleanupTXN(rb, txn);
}
--
1.8.3.1
Hello Ajin,
The v9 patches provided support for two-phase transactions for NON-streaming.
Now I have added STREAM support for two-phase transactions, and bumped
all patches to version v10.
(The 0001 and 0002 patches are unchanged. Only 0003 is changed).
--
There are a few TODO/FIXME comments in the code highlighting parts
needing some attention.
There is a #define DEBUG_STREAM_2PC useful for debugging, which I can
remove later.
All the patches have some whitespaces issues when applied. We can
resolve them as we go.
Please let me know any comments/feedback.
Kind Regards
Peter Smith.
Fujitsu Australia.
Attachments:
v10-0001-Support-2PC-txn-base.patchapplication/octet-stream; name=v10-0001-Support-2PC-txn-base.patchDownload
From 5df1c97cddf3539ecdbdcb18f55a341ee2cd410a Mon Sep 17 00:00:00 2001
From: postgres <postgres@CentOS7-x64.fritz.box>
Date: Fri, 16 Oct 2020 14:19:59 +1100
Subject: [PATCH v10] =?UTF-8?q?Support=202PC=20txn=20=E2=80=93=20base.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Until now two-phase transaction commands were translated into regular transactions
on the subscriber, and the GID was not forwarded to it. None of the two-phase semantics
were communicated to the subscriber.
This patch provides infrastructure for logical decoding plugins to be informed of
two-phase commands Like PREPARE TRANSACTION, COMMIT PREPARED
and ROLLBACK PREPARED commands with the corresponding GID.
Include logical decoding plugin API infrastructure changes.
Includes contrib/test_decoding changes.
Includes documentation changes.
---
contrib/test_decoding/test_decoding.c | 243 +++++++++++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 173 +++++++++++++++-
src/backend/replication/logical/logical.c | 314 ++++++++++++++++++++++++++++++
src/include/replication/logical.h | 5 +
src/include/replication/output_plugin.h | 64 ++++++
src/include/replication/reorderbuffer.h | 95 +++++++++
6 files changed, 887 insertions(+), 7 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 8e33614..cf7d674 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid_aborted; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -73,6 +78,15 @@ static void pg_decode_stream_stop(LogicalDecodingContext *ctx,
static void pg_decode_stream_abort(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_stream_commit_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_stream_rollback_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
static void pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
@@ -88,6 +102,18 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
void
_PG_init(void)
@@ -112,10 +138,17 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_start_cb = pg_decode_stream_start;
cb->stream_stop_cb = pg_decode_stream_stop;
cb->stream_abort_cb = pg_decode_stream_abort;
+ cb->stream_prepare_cb = pg_decode_stream_prepare;
+ cb->stream_commit_prepared_cb = pg_decode_stream_commit_prepared;
+ cb->stream_rollback_prepared_cb = pg_decode_stream_rollback_prepared;
cb->stream_commit_cb = pg_decode_stream_commit;
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->rollback_prepared_cb = pg_decode_rollback_prepared_txn;
}
@@ -127,6 +160,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
ListCell *option;
TestDecodingData *data;
bool enable_streaming = false;
+ bool enable_2pc = false;
data = palloc0(sizeof(TestDecodingData));
data->context = AllocSetContextCreate(ctx->context,
@@ -136,6 +170,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid_aborted = InvalidTransactionId;
ctx->output_plugin_private = data;
@@ -227,6 +262,35 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
+ else if (!parse_bool(strVal(elem->arg), &enable_2pc))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse value \"%s\" for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+ }
+ else if (strcmp(elem->defname, "check-xid-aborted") == 0)
+ {
+ if (elem->arg == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted needs an input value")));
+ else
+ {
+ errno = 0;
+ data->check_xid_aborted = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (!TransactionIdIsValid(data->check_xid_aborted))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted is not a valid xid: \"%s\"",
+ strVal(elem->arg))));
+ }
+ }
else
{
ereport(ERROR,
@@ -238,6 +302,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->twophase &= enable_2pc;
}
/* cleanup this plugin's resources */
@@ -297,6 +362,93 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +607,25 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid_aborted is a valid xid, then it was passed in
+ * as an option to check if the transaction having this xid would be aborted.
+ * This is to test concurrent aborts.
+ */
+ if (TransactionIdIsValid(data->check_xid_aborted))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid_aborted);
+ while (TransactionIdIsInProgress(data->check_xid_aborted))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid_aborted) &&
+ !TransactionIdDidCommit(data->check_xid_aborted))
+ elog(LOG, "%u aborted", data->check_xid_aborted);
+
+ Assert(TransactionIdDidAbort(data->check_xid_aborted));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
@@ -646,6 +817,78 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}
static void
+pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "preparing streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "preparing streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
+pg_decode_stream_commit_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "commit prepared streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "commit prepared streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
+pg_decode_stream_rollback_prepared(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "abort prepared streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "abort prepared streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..824c55a 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -387,11 +387,18 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamRollbackPreparedCB stream_rollback_prepared_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
@@ -414,9 +421,18 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb);
in-progress transactions. The <function>stream_start_cb</function>,
<function>stream_stop_cb</function>, <function>stream_abort_cb</function>,
<function>stream_commit_cb</function> and <function>stream_change_cb</function>
- are required, while <function>stream_message_cb</function> and
+ are required, while <function>stream_message_cb</function>,
+ <function>stream_prepare_cb</function>, <function>stream_commit_prepared_cb</function>,
+ <function>stream_rollback_prepared_cb</function> and
<function>stream_truncate_cb</function> are optional.
</para>
+
+ <para>
+ An output plugin may also define functions to support two-phase commits, which are
+ decoded on <command>PREPARE TRANSACTION</command>. The <function>prepare_cb</function>,
+ <function>commit_prepared_cb</function> and <function>rollback_prepared_cb</function>
+ callbacks are required, while <function>filter_prepare_cb</function> is optional.
+ </para>
</sect2>
<sect2 id="logicaldecoding-capabilities">
@@ -477,7 +493,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +600,55 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The optional <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Transaction Commit Prepared Callback</title>
+
+ <para>
+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a transaction commit prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-rollback-prepared">
+ <title>Transaction Rollback Prepared Callback</title>
+
+ <para>
+ The optional <function>rollback_prepared_cb</function> callback is called whenever
+ a transaction rollback prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +658,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +741,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +795,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
@@ -735,6 +851,45 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-stream-prepare">
+ <title>Stream Prepare Callback</title>
+ <para>
+ The <function>stream_prepare_cb</function> callback is called to prepare
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-commit-prepared">
+ <title>Stream Commit Prepared Callback</title>
+ <para>
+ The <function>stream_commit_prepared_cb</function> callback is called to commit
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-stream-rollback-prepared">
+ <title>Stream Rollback Prepared Callback</title>
+ <para>
+ The <function>stream_rollback_prepared_cb</function> callback is called to abort
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-stream-commit">
<title>Stream Commit Callback</title>
<para>
@@ -913,9 +1068,13 @@ OutputPluginWrite(ctx, true);
When streaming an in-progress transaction, the changes (and messages) are
streamed in blocks demarcated by <function>stream_start_cb</function>
and <function>stream_stop_cb</function> callbacks. Once all the decoded
- changes are transmitted, the transaction is committed using the
- <function>stream_commit_cb</function> callback (or possibly aborted using
- the <function>stream_abort_cb</function> callback).
+ changes are transmitted, the transaction can be committed using the
+ the <function>stream_commit_cb</function> callback
+ (or possibly aborted using the <function>stream_abort_cb</function> callback).
+ If two-phase commits are supported, the transaction can be prepared using the
+ <function>stream_prepare_cb</function> callback, commit prepared using the
+ <function>stream_commit_prepared_cb</function> callback or aborted using the
+ <function>stream_rollback_prepared_cb</function>.
</para>
<para>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 8675832..84a4751 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -59,6 +59,14 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -74,6 +82,12 @@ static void stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr last_lsn);
static void stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void stream_rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
static void stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
static void stream_change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -207,6 +221,10 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->rollback_prepared = rollback_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -221,12 +239,28 @@ StartupDecodingContext(List *output_plugin_options,
ctx->streaming = (ctx->callbacks.stream_start_cb != NULL) ||
(ctx->callbacks.stream_stop_cb != NULL) ||
(ctx->callbacks.stream_abort_cb != NULL) ||
+ (ctx->callbacks.stream_prepare_cb != NULL) ||
+ (ctx->callbacks.stream_commit_prepared_cb != NULL) ||
+ (ctx->callbacks.stream_rollback_prepared_cb != NULL) ||
(ctx->callbacks.stream_commit_cb != NULL) ||
(ctx->callbacks.stream_change_cb != NULL) ||
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);
/*
+ * To support two-phase logical decoding, we require prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however enable two phase logical
+ * decoding when at least one of the methods is enabled so that we can easily identify
+ * missing methods.
+ *
+ * We decide it here, but only check it later in the wrappers.
+ */
+ ctx->twophase = (ctx->callbacks.prepare_cb != NULL) ||
+ (ctx->callbacks.commit_prepared_cb != NULL) ||
+ (ctx->callbacks.rollback_prepared_cb != NULL) ||
+ (ctx->callbacks.filter_prepare_cb != NULL);
+
+ /*
* streaming callbacks
*
* stream_message and stream_truncate callbacks are optional, so we do not
@@ -237,6 +271,9 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->stream_start = stream_start_cb_wrapper;
ctx->reorder->stream_stop = stream_stop_cb_wrapper;
ctx->reorder->stream_abort = stream_abort_cb_wrapper;
+ ctx->reorder->stream_prepare = stream_prepare_cb_wrapper;
+ ctx->reorder->stream_commit_prepared = stream_commit_prepared_cb_wrapper;
+ ctx->reorder->stream_rollback_prepared = stream_rollback_prepared_cb_wrapper;
ctx->reorder->stream_commit = stream_commit_cb_wrapper;
ctx->reorder->stream_change = stream_change_cb_wrapper;
ctx->reorder->stream_message = stream_message_cb_wrapper;
@@ -783,6 +820,120 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin supports two-phase commits then prepare callback is mandatory */
+ if (ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then commit prepared callback is mandatory */
+ if (ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "rollback_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support 2 phase commits then abort prepared callback is mandatory */
+ if (ctx->callbacks.rollback_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register rollback_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.rollback_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -859,6 +1010,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of two-phase at PREPARE time is not enabled. In that
+ * case all two-phase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
@@ -1057,6 +1253,124 @@ stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming and two-phase commits are supported. */
+ Assert(ctx->streaming);
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_prepare";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_prepare_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming commits requires a stream_prepare_cb callback")));
+
+ ctx->callbacks.stream_prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+stream_commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_commit_prepared";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_commit_prepared_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming requires a stream_commit_prepared_cb callback")));
+
+ ctx->callbacks.stream_commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+stream_rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming is supported. */
+ Assert(ctx->streaming);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_rollback_prepared";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_rollback_prepared_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming requires a stream_rollback_prepared_cb callback")));
+
+ ctx->callbacks.stream_rollback_prepared_cb(ctx, txn, rollback_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 40bab7e..a191e82 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -84,6 +84,11 @@ typedef struct LogicalDecodingContext
*/
bool streaming;
+ /*
+ * Does the output plugin support two-phase decoding, and is it enabled?
+ */
+ bool twophase;
+
/*
* State for writing output.
*/
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..dfcb577 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -77,6 +77,39 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/rollback_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -124,6 +157,30 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);
/*
+ * Called to prepare changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called to commit prepared changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called to abort/rollback prepared changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+
+/*
* Called to apply changes streamed to remote node from in-progress
* transaction.
*/
@@ -171,12 +228,19 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
+ LogicalDecodeStreamCommitPreparedCB stream_commit_prepared_cb;
+ LogicalDecodeStreamRollbackPreparedCB stream_rollback_prepared_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1c77819..6cb4cb4 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -174,6 +175,9 @@ typedef struct ReorderBufferChange
#define RBTXN_IS_STREAMED 0x0010
#define RBTXN_HAS_TOAST_INSERT 0x0020
#define RBTXN_HAS_SPEC_INSERT 0x0040
+#define RBTXN_PREPARE 0x0080
+#define RBTXN_COMMIT_PREPARED 0x0100
+#define RBTXN_ROLLBACK_PREPARED 0x0200
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -233,6 +237,24 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* Has this transaction been prepared? */
+#define rbtxn_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_PREPARE) != 0 \
+)
+
+/* Has this prepared transaction been committed? */
+#define rbtxn_commit_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_COMMIT_PREPARED) != 0 \
+)
+
+/* Has this prepared transaction been rollbacked? */
+#define rbtxn_rollback_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_ROLLBACK_PREPARED) != 0 \
+)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -244,6 +266,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -405,6 +430,36 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* abort prepared callback signature */
+typedef void (*ReorderBufferRollbackPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -431,6 +486,24 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamRollbackPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+
/* commit streamed transaction callback signature */
typedef void (*ReorderBufferStreamCommitCB) (
ReorderBuffer *rb,
@@ -497,6 +570,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferRollbackPreparedCB rollback_prepared;
ReorderBufferMessageCB message;
/*
@@ -505,6 +583,9 @@ struct ReorderBuffer
ReorderBufferStreamStartCB stream_start;
ReorderBufferStreamStopCB stream_stop;
ReorderBufferStreamAbortCB stream_abort;
+ ReorderBufferStreamPrepareCB stream_prepare;
+ ReorderBufferStreamCommitPreparedCB stream_commit_prepared;
+ ReorderBufferStreamRollbackPreparedCB stream_rollback_prepared;
ReorderBufferStreamCommitCB stream_commit;
ReorderBufferStreamChangeCB stream_change;
ReorderBufferStreamMessageCB stream_message;
@@ -574,6 +655,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -597,6 +683,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
v10-0002-Support-2PC-txn-backend-and-tests.patchapplication/octet-stream; name=v10-0002-Support-2PC-txn-backend-and-tests.patchDownload
From 6ab90d1c917070a9ac6cc6c055a5d9ff755027a8 Mon Sep 17 00:00:00 2001
From: postgres <postgres@CentOS7-x64.fritz.box>
Date: Fri, 16 Oct 2020 14:35:00 +1100
Subject: [PATCH v10] =?UTF-8?q?Support=202PC=20txn=20=E2=80=93=20backend?=
=?UTF-8?q?=20and=20tests.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes backend changes to support decoding of PREPARE TRANSACTION,
COMMIT PREPARED and ROLLBACK PREPARED.
Includes two-phase commit test code (for test_decoding).
---
contrib/test_decoding/Makefile | 4 +-
contrib/test_decoding/expected/two_phase.out | 219 +++++++++++++++++++
contrib/test_decoding/sql/two_phase.sql | 117 ++++++++++
contrib/test_decoding/t/001_twophase.pl | 119 ++++++++++
src/backend/replication/logical/decode.c | 250 ++++++++++++++++++++-
src/backend/replication/logical/reorderbuffer.c | 278 ++++++++++++++++++++----
6 files changed, 937 insertions(+), 50 deletions(-)
create mode 100644 contrib/test_decoding/expected/two_phase.out
create mode 100644 contrib/test_decoding/sql/two_phase.sql
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a4c76f..2abf3ce 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -4,11 +4,13 @@ MODULES = test_decoding
PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
- decoding_into_rel binary prepared replorigin time messages \
+ decoding_into_rel binary prepared replorigin time two_phase messages \
spill slot truncate stream stats
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/two_phase.out b/contrib/test_decoding/expected/two_phase.out
new file mode 100644
index 0000000..3ac01a4
--- /dev/null
+++ b/contrib/test_decoding/expected/two_phase.out
@@ -0,0 +1,219 @@
+-- Test two-phased transactions, when two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ table public.test_prepared1: INSERT: id[integer]:2
+ PREPARE TRANSACTION 'test_prepared#1'
+(4 rows)
+
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
+-- test abort of a prepared xact
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(3 rows)
+
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
+-- test prepared xact containing ddl
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(3 rows)
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:5
+ COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:6 data[text]:null
+ COMMIT
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
+ COMMIT
+(6 rows)
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+----------------+----------+---------------------
+ test_prepared1 | relation | RowExclusiveLock
+ test_prepared1 | relation | ShareLock
+ test_prepared1 | relation | AccessExclusiveLock
+(3 rows)
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+(4 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- test savepoints and sub-xacts as a result
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------------------------------------------------------------
+ BEGIN
+ table public.test_prepared_savepoint: INSERT: a[integer]:1
+ PREPARE TRANSACTION 'test_prepared_savepoint'
+(3 rows)
+
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+SELECT pg_drop_replication_slot('regression_slot');
+ pg_drop_replication_slot
+--------------------------
+
+(1 row)
+
diff --git a/contrib/test_decoding/sql/two_phase.sql b/contrib/test_decoding/sql/two_phase.sql
new file mode 100644
index 0000000..e3e2690
--- /dev/null
+++ b/contrib/test_decoding/sql/two_phase.sql
@@ -0,0 +1,117 @@
+-- Test two-phased transactions, when two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test abort of a prepared xact
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+
+-- test prepared xact containing ddl
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+
+COMMIT PREPARED 'test_prepared_lock';
+
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test savepoints and sub-xacts as a result
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..51c6a35
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,119 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid-aborted", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid-aborted" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid-aborted"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid-aborted', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee9..e011fd9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -68,8 +68,15 @@ static void DecodeSpecConfirm(LogicalDecodingContext *ctx, XLogRecordBuffer *buf
static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
+static void DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -239,7 +246,6 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
switch (info)
{
case XLOG_XACT_COMMIT:
- case XLOG_XACT_COMMIT_PREPARED:
{
xl_xact_commit *xlrec;
xl_xact_parsed_commit parsed;
@@ -256,8 +262,24 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeCommit(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ xl_xact_commit *xlrec;
+ xl_xact_parsed_commit parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_commit *) XLogRecGetData(r);
+ ParseCommitRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeCommitPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ABORT:
- case XLOG_XACT_ABORT_PREPARED:
{
xl_xact_abort *xlrec;
xl_xact_parsed_abort parsed;
@@ -274,6 +296,23 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeAbort(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ xl_xact_abort *xlrec;
+ xl_xact_parsed_abort parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_abort *) XLogRecGetData(r);
+ ParseAbortRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeAbortPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ASSIGNMENT:
/*
@@ -312,17 +351,35 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* check that output plugin is capable of two-phase decoding */
+ if (!ctx->twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
+
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -659,6 +716,131 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Consolidated commit record handling between the different form of commit
+ * records.
+ */
+static void
+DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid)
+{
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = parsed->xact_time;
+ RepOriginId origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
+ parsed->nsubxacts, parsed->subxacts);
+
+ /* ----
+ * Check whether we are interested in this specific transaction, and tell
+ * the reorderbuffer to forget the content of the (sub-)transactions
+ * if not.
+ *
+ * There can be several reasons we might not be interested in this
+ * transaction:
+ * 1) We might not be interested in decoding transactions up to this
+ * LSN. This can happen because we previously decoded it and now just
+ * are restarting or if we haven't assembled a consistent snapshot yet.
+ * 2) The transaction happened in another database.
+ * 3) The output plugin is not interested in the origin.
+ * 4) We are doing fast-forwarding
+ *
+ * We can't just use ReorderBufferAbort() here, because we need to execute
+ * the transaction's invalidations. This currently won't be needed if
+ * we're just skipping over the transaction because currently we only do
+ * so during startup, to get to the first transaction the client needs. As
+ * we have reset the catalog caches before starting to read WAL, and we
+ * haven't yet touched any catalogs, there can't be anything to invalidate.
+ * But if we're "forgetting" this commit because it's it happened in
+ * another database, the invalidations might be important, because they
+ * could be for shared catalogs and we might have loaded data into the
+ * relevant syscaches.
+ * ---
+ */
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ {
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
+ }
+ ReorderBufferForget(ctx->reorder, xid, buf->origptr);
+
+ return;
+ }
+
+ /* tell the reorderbuffer about the surviving subtransactions */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /*
+ * For COMMIT PREPARED, the changes have already been replayed at
+ * PREPARE time, so we only need to notify the subscriber that the GID
+ * finally committed.
+ * If filter check present and this needs to be skipped, do a regular commit.
+ */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+ else
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
+}
+
+/*
* Get the data from the various forms of abort records and pass it on to
* snapbuild.c and reorderbuffer.c
*/
@@ -681,6 +863,50 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Get the data from the various forms of abort records and pass it on to
+ * snapbuild.c and reorderbuffer.c
+ */
+static void
+DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid)
+{
+ int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it passes through the filters handle the ROLLBACK via callbacks
+ */
+ if(!FilterByOrigin(ctx, origin_id) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ !ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ Assert(TransactionIdIsValid(xid));
+ Assert(parsed->dbId == ctx->slot->data.database);
+
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
+
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferAbort(ctx->reorder, parsed->subxacts[i],
+ buf->record->EndRecPtr);
+ }
+
+ ReorderBufferAbort(ctx->reorder, xid, buf->record->EndRecPtr);
+}
+
+/*
* Parse XLOG_HEAP_INSERT (not MULTI_INSERT!) records into tuplebufs.
*
* Deletes can contain the new tuple.
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7a8bf76..9df7ff7 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
static void ReorderBufferCleanupSerializedTXNs(const char *slotname);
static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
TransactionId xid, XLogSegNo segno);
@@ -418,6 +419,12 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
+
if (txn->tuplecid_hash != NULL)
{
hash_destroy(txn->tuplecid_hash);
@@ -1511,12 +1518,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either after streaming or
+ * after a PREPARE.
+ * The flag txn_prepared indicates if this is called after a PREPARE.
+ * If streaming, keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots. If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared)
{
dlist_mutable_iter iter;
@@ -1535,7 +1544,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
/* cleanup changes in the txn */
@@ -1569,9 +1578,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
+ {
+ /*
+ * If this is a prepared txn, cleanup the tuplecids we stored for decoding
+ * catalog snapshot access.
+ * They are always stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);
+ Assert(change->action == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* Remove the change from its containing list. */
+ dlist_delete(&change->node);
+
+ ReorderBufferReturnChange(rb, change, true);
+ }
+ }
+
/*
* Destroy the (relfilenode, ctid) hashtable, so that we don't leak any
* memory. We could also keep the hash table and update it with new ctid
@@ -1765,9 +1798,18 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn)
ReorderBufferStreamTXN(rb, txn);
- rb->stream_commit(rb, txn, txn->final_lsn);
-
- ReorderBufferCleanupTXN(rb, txn);
+ if (rbtxn_prepared(txn))
+ {
+ rb->stream_prepare(rb, txn, txn->final_lsn);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
+ else
+ {
+ rb->stream_commit(rb, txn, txn->final_lsn);
+ ReorderBufferCleanupTXN(rb, txn);
+ }
}
/*
@@ -1896,7 +1938,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
ReorderBufferChange *specinsert)
{
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Free all resources allocated for toast reconstruction */
ReorderBufferToastReset(rb, txn);
@@ -2003,7 +2045,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2294,7 +2336,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for two-phase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2335,11 +2386,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
+ {
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
else
ReorderBufferCleanupTXN(rb, txn);
}
@@ -2369,17 +2426,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
- * This error can only occur when we are sending the data in
- * streaming mode and the streaming is not finished yet.
+ * This error can only occur either when we are sending the data in
+ * streaming mode and the streaming is not finished yet or when we are
+ * sending the data out on a PREPARE during a two-phase commit.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2387,10 +2445,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream remaining data. */
+ if (streaming)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
}
else
{
@@ -2412,23 +2479,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2470,6 +2530,145 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a two-phase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ROLLBACK PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, two-phase transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ else
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+
+ if (rbtxn_is_streamed(txn) && rbtxn_commit_prepared(txn))
+ rb->stream_commit_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_is_streamed(txn) && rbtxn_rollback_prepared(txn))
+ rb->stream_rollback_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_commit_prepared(txn))
+ rb->commit_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_rollback_prepared(txn))
+ rb->rollback_prepared(rb, txn, commit_lsn);
+
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(txn->ninvalidations,
+ txn->invalidations);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2512,7 +2711,12 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
ReorderBufferCleanupTXN(rb, txn);
}
--
1.8.3.1
v10-0003-Support-2PC-txn-pgoutput.patchapplication/octet-stream; name=v10-0003-Support-2PC-txn-pgoutput.patchDownload
From a5e0b59a6d7d9a3cf1e091d41814597cce36e5f1 Mon Sep 17 00:00:00 2001
From: postgres <postgres@CentOS7-x64.fritz.box>
Date: Fri, 16 Oct 2020 16:53:24 +1100
Subject: [PATCH v10] Support 2PC txn - pgoutput.
This patch adds support in the pgoutput plugin and subscriber for handling
of two-phase commits.
Includes pgoutput changes.
Includes subscriber changes.
Includes two-phase commit test code (streaming and not streaming).
---
src/backend/access/transam/twophase.c | 27 ++
src/backend/replication/logical/proto.c | 173 +++++++-
src/backend/replication/logical/worker.c | 466 +++++++++++++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 130 ++++++
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 39 ++
src/test/subscription/t/020_twophase.pl | 163 ++++++++
src/test/subscription/t/021_twophase_streaming.pl | 366 +++++++++++++++++
8 files changed, 1363 insertions(+), 2 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
create mode 100644 src/test/subscription/t/021_twophase_streaming.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060..2e0a408 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+ bool found = false;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (gxact->valid && strcmp(gxact->gid, gid) == 0)
+ {
+ found = true;
+ break;
+ }
+
+ }
+ LWLockRelease(TwoPhaseStateLock);
+ return found;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..c0b83da 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -78,7 +78,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, 'C'); /* sending COMMIT */
- /* send the flags field (unused for now) */
+ /* send the flags field */
pq_sendbyte(out, flags);
/* send fields */
@@ -99,6 +99,7 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
if (flags != 0)
elog(ERROR, "unrecognized flags %u in commit message", flags);
+
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
commit_data->end_lsn = pq_getmsgint64(in);
@@ -106,6 +107,176 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for two-phase commit transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(strlen(txn->gid) > 0);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags |= LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags |= LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
+ * Write STREAM PREPARE to the output stream.
+ *
+ * (For stream PREPARE, stream COMMIT PREPARED, stream ROLLBACK PREPARED)
+ *
+ * TODO- This is mostly cut/paste from logicalrep_write_prepare. Consider refactoring for commonality.
+ */
+void
+logicalrep_write_stream_prepare(StringInfo out,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "proto: logicalrep_write_stream_prepare");
+#endif
+
+ pq_sendbyte(out, 'p'); /* sending STREAM PREPARE protocol */
+
+ /*
+ * This should only ever happen for two-phase transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(strlen(txn->gid) > 0);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags |= LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags |= LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* transaction ID */
+ Assert(TransactionIdIsValid(txn->xid));
+ pq_sendint32(out, txn->xid);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read STREAM PREPARE from the output stream.
+ *
+ * (For stream PREPARE, stream COMMIT PREPARED, stream ROLLBACK PREPARED)
+ *
+ * TODO - This is mostly cut/paste from logicalrep_read_prepare. Consider refactoring for commonality.
+ */
+TransactionId
+logicalrep_read_stream_prepare(StringInfo in, LogicalRepPrepareData *prepare_data)
+{
+ TransactionId xid;
+ uint8 flags;
+
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "proto: logicalrep_read_stream_prepare");
+#endif
+
+ xid = pq_getmsgint(in, 4);
+
+ /* read flags */
+ flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+
+ return xid;
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b8e297c..fbb7acf 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -722,6 +722,7 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_timestamp = commit_data.committime;
CommitTransactionCommand();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -742,6 +743,461 @@ apply_handle_commit(StringInfo s)
}
/*
+ * Called from apply_handle_prepare to handle a PREPARE TRANSACTION.
+ */
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a COMMIT PREPARED of a previously
+ * PREPARED transaction.
+ */
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a ROLLBACK PREPARED of a previously
+ * PREPARED TRANSACTION.
+ */
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
+/* Called from apply_handle_stream_prepare to handle STREAM PREPARE. */
+static void
+apply_handle_stream_prepare_txn(TransactionId xid, LogicalRepPrepareData *prepare_data)
+{
+ StringInfoData s2;
+ int nchanges;
+ char path[MAXPGPATH];
+ char *buffer = NULL;
+ bool found;
+ StreamXidHash *ent;
+ MemoryContext oldcxt;
+ BufFile *fd;
+
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: apply_handle_stream_prepare_txn");
+#endif
+
+ /*
+ * FIXME - Following condition was in apply_handle_prepare_txn except I found it was ALWAYS IsTransactionState() == false
+ * The synchronization worker runs in single transaction. *
+ if (IsTransactionState() && !am_tablesync_worker())
+ */
+ if (!am_tablesync_worker())
+ {
+ /*
+ * ==================================================================================================
+ * The following chunk of code is largely cut/paste from the existing apply_handle_prepare_commit_txn
+ * which was handling the non-two-phase streaming commit by applying the operations of the spooled file.
+ *
+ * Differences are:
+ * - Here the xid is known already because apply_handle_stream_prepare already called
+ * locicalrep_read_stream_prepare
+ *
+ * TODO - This is possible candidate for refactoring since so much of it is the same.
+ * ==================================================================================================
+ */
+
+ Assert(!in_streamed_transaction);
+
+ elog(DEBUG1, "received prepare for streamed transaction %u", xid);
+
+ ensure_transaction();
+
+ /*
+ * Allocate file handle and memory required to process all the messages in
+ * TopTransactionContext to avoid them getting reset after each message is
+ * processed.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /* open the spool file for the committed transaction */
+ changes_filename(path, MyLogicalRepWorker->subid, xid);
+ elog(DEBUG1, "replaying changes from file \"%s\"", path);
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: replaying changes from file \"%s\"", path);
+#endif
+ ent = (StreamXidHash *) hash_search(xidhash,
+ (void *) &xid,
+ HASH_FIND,
+ &found);
+ Assert(found);
+ fd = BufFileOpenShared(ent->stream_fileset, path, O_RDONLY);
+
+ buffer = palloc(BLCKSZ);
+ initStringInfo(&s2);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ remote_final_lsn = prepare_data->prepare_lsn;
+
+ /*
+ * Make sure the handle apply_dispatch methods are aware we're in a remote
+ * transaction.
+ */
+ in_remote_transaction = true;
+ pgstat_report_activity(STATE_RUNNING, NULL);
+
+ /*
+ * Read the entries one by one and pass them through the same logic as in
+ * apply_dispatch.
+ */
+ nchanges = 0;
+ while (true)
+ {
+ int nbytes;
+ int len;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* read length of the on-disk record */
+ nbytes = BufFileRead(fd, &len, sizeof(len));
+
+ /* have we reached end of the file? */
+ if (nbytes == 0)
+ break;
+
+ /* do we have a correct length? */
+ if (nbytes != sizeof(len))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read from streaming transaction's changes file \"%s\": %m",
+ path)));
+
+ Assert(len > 0);
+
+ /* make sure we have sufficiently large buffer */
+ buffer = repalloc(buffer, len);
+
+ /* and finally read the data into the buffer */
+ if (BufFileRead(fd, buffer, len) != len)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read from streaming transaction's changes file \"%s\": %m",
+ path)));
+
+ /* copy the buffer to the stringinfo and call apply_dispatch */
+ resetStringInfo(&s2);
+ appendBinaryStringInfo(&s2, buffer, len);
+
+ /* Ensure we are reading the data into our memory context. */
+ oldcxt = MemoryContextSwitchTo(ApplyMessageContext);
+
+ apply_dispatch(&s2);
+
+ MemoryContextReset(ApplyMessageContext);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ nchanges++;
+
+ if (nchanges % 1000 == 0)
+ elog(DEBUG1, "replayed %d changes from file '%s'",
+ nchanges, path);
+ }
+
+ BufFileClose(fd);
+
+ pfree(buffer);
+ pfree(s2.data);
+
+ /*
+ * ==================================================================================================
+ * The following chunk of code is cut/paste from the existing apply_handle_prepare_txn
+ * which was handling the two-phase prepare of the non-streamed tx
+ * ==================================================================================================
+ */
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: call PrepareTransactionBlock()");
+#endif
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ /* End of copied prepare code */
+
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+
+ elog(DEBUG1, "replayed %d (all) changes from file \"%s\"", nchanges, path);
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: replayed %d (all) changes from file \"%s\"", nchanges, path);
+#endif
+
+ /* ============================================================================================== */
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ /* unlink the files with serialized changes and subxact info */
+ /* FIXME - OK to do this here (outside of if/else). */
+ stream_cleanup_files(MyLogicalRepWorker->subid, xid);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_stream_prepare to handle STREAM COMMIT PREPARED.
+ * NOTE. Following code is exactly same as apply_handle_commit_prepared_txn.
+ */
+static void
+apply_handle_stream_commit_prepared_txn(LogicalRepPrepareData *prepare_data)
+{
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: apply_handle_stream_commit_prepared_txn");
+#endif
+
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_stream_prepare to handle STREAM ROLLBACK PREPARED.
+ * NOTE. Following code is exactly same as apply_handle_commit_prepared_txn.
+ */
+static void
+apply_handle_stream_rollback_prepared_txn(LogicalRepPrepareData *prepare_data)
+{
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: apply_handle_stream_rollback_prepared_txn");
+#endif
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle STREAM PREPARE message.
+ */
+static void
+apply_handle_stream_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+ TransactionId xid;
+
+ xid = logicalrep_read_stream_prepare(s, &prepare_data);
+
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: apply_handle_stream_prepare (xid=%u, prepare_type=%u)", xid, prepare_data.prepare_type);
+#endif
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_stream_prepare_txn(xid, &prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_stream_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_stream_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of stream prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
+/*
* Handle ORIGIN message.
*
* TODO, support tracking of multiple origins
@@ -1908,10 +2364,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
@@ -1956,6 +2416,10 @@ apply_dispatch(StringInfo s)
case 'c':
apply_handle_stream_commit(s);
break;
+ /* STREAM PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'p':
+ apply_handle_stream_prepare(s);
+ break;
default:
ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 9c997ae..240fca6 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -57,6 +63,12 @@ static void pgoutput_stream_abort(struct LogicalDecodingContext *ctx,
static void pgoutput_stream_commit(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static void pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_stream_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void pgoutput_stream_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr abort_lsn);
static bool publications_valid;
static bool in_streaming;
@@ -143,6 +155,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->rollback_prepared_cb = pgoutput_rollback_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -153,6 +169,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_commit_cb = pgoutput_stream_commit;
cb->stream_change_cb = pgoutput_change;
cb->stream_truncate_cb = pgoutput_truncate;
+ /* transaction streaming - two-phase commit */
+ cb->stream_prepare_cb = pgoutput_stream_prepare_txn;
+ cb->stream_commit_prepared_cb = pgoutput_stream_commit_prepared_txn;
+ cb->stream_rollback_prepared_cb = pgoutput_stream_rollback_prepared_txn;
}
static void
@@ -378,6 +398,48 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * PREPARE callback
+ */
+static void
+pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
* Write the current schema of the relation and its ancestor (if any) if not
* done yet.
*/
@@ -857,6 +919,74 @@ pgoutput_stream_commit(struct LogicalDecodingContext *ctx,
}
/*
+ * PREPARE callback (for streaming two-phase commit).
+ *
+ * Notify the downstream to prepare the transaction.
+ */
+static void
+pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "pgoutput: pgoutput_stream_prepare_txn");
+#endif
+
+ Assert(rbtxn_is_streamed(txn));
+
+ OutputPluginUpdateProgress(ctx);
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_stream_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback (for streaming two-phase commit).
+ *
+ * Notify the downstream to commit the previously prepared transaction.
+ */
+static void
+pgoutput_stream_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "pgoutput: pgoutput_stream_commit_prepared_txn");
+#endif
+
+ Assert(rbtxn_is_streamed(txn));
+ Assert(rbtxn_prepared(txn));
+
+ OutputPluginUpdateProgress(ctx);
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_stream_prepare(ctx->out, txn, commit_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * ROLLBACK PREPARED callback (for streaming two-phase commit).
+ *
+ * Notify the downstream to abort the previously prepared transaction.
+ */
+static void
+pgoutput_stream_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "pgoutput: pgoutput_stream_rollback_prepared_txn");
+#endif
+
+ Assert(rbtxn_is_streamed(txn));
+ Assert(rbtxn_prepared(txn));
+
+ OutputPluginUpdateProgress(ctx);
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_stream_prepare(ctx->out, txn, abort_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
* Initialize the relation schema sync cache for a decoding session.
*
* The hash table is destroyed at the end of a decoding session. While
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0c2cda2..162e41a 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -87,6 +87,7 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
XLogRecPtr commit_lsn;
@@ -94,6 +95,28 @@ typedef struct LogicalRepCommitData
TimestampTz committime;
} LogicalRepCommitData;
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ROLLBACK] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ ((flags == LOGICALREP_IS_PREPARE) || \
+ (flags == LOGICALREP_IS_COMMIT_PREPARED) || \
+ (flags == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
@@ -101,6 +124,10 @@ extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
@@ -144,4 +171,16 @@ extern void logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
extern void logicalrep_read_stream_abort(StringInfo in, TransactionId *xid,
TransactionId *subxid);
+/*
+ * FIXME - Uncomment this to see more lgging for streamed two-phase transactions.
+ *
+ * #define DEBUG_STREAM_2PC
+ */
+
+extern void logicalrep_write_stream_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern TransactionId logicalrep_read_stream_prepare(StringInfo in,
+ LogicalRepPrepareData *prepare_data);
+
+
#endif /* LOGICAL_PROTO_H */
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..3feb2c3
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,163 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+ ));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf(
+ 'postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+"ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2"
+);
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (11);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 11;");
+ is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is committed on subscriber');
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (12);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 12;");
+ is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is aborted on subscriber');
+
+# Check that commit prepared is decoded properly on crash restart
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a IN (12,13);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
diff --git a/src/test/subscription/t/021_twophase_streaming.pl b/src/test/subscription/t/021_twophase_streaming.pl
new file mode 100644
index 0000000..7dfc965
--- /dev/null
+++ b/src/test/subscription/t/021_twophase_streaming.pl
@@ -0,0 +1,366 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 23;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_publisher->append_conf('postgresql.conf', qq(logical_decoding_work_mem = 64kB));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher (uses same DDL as 015_stream test)
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE test_tab (a int primary key, b varchar)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");
+
+# Setup structure on subscriber (columns a and b are compatible with same table name on publisher)
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+
+# Setup logical replication (streaming = on)
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = on)");
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+ "SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+ "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check initial data was copied to subscriber');
+
+# --------------------------------------------------------------
+# 2PC PREPARE / COMMIT PREPARED test.
+#
+# Mass data is streamed as a 2PC transaction.
+# Then there is a commit prepared.
+# Expect all data is replicated on subscriber side after the commit.
+# --------------------------------------------------------------
+#
+# check that 2PC gets replicated to subscriber
+# Insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# 2PC gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber, and extra columns contain local defaults');
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+# --------------------------------------------------------------
+# 2PC PREPARE / ROLLBACK PREPARED test.
+#
+# Table is deleted back to 2 rows which are replicated on subscriber.
+# Mass data is streamed using 2PC but then there is a rollback prepared.
+# Expect data rolls back leaving only the 2 rows on the subscriber.
+# --------------------------------------------------------------
+#
+# First, delete the data (delete will be replicated)
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# 2PC gets aborted
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check initial data was copied to subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+# --------------------------------------------------------------
+# Check that 2PC commit prepared is decoded properly on crash restart.
+#
+# insert, update and delete enough rows to exceed the 64kB limit.
+# Then server crashes before the 2PC transaction is committed.
+# After servers are restarted the pending transaction is committed.
+# Expect all data is replicated on subscriber side after the commit.
+# --------------------------------------------------------------
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber, and extra columns contain local defaults');
+
+# --------------------------------------------------------------
+# Do INSERT outside of 2PC but before ROLLBACK PREPARED.
+#
+# Table is deleted back to 2 rows which are replicated on subscriber.
+# Mass data is streamed using 2PC.
+# A single row INSERT is done which is outside of the 2PC transaction
+# Then there is a rollback prepared.
+# Expect 2PC data rolls back leaving only 3 rows on the subscriber.
+# --------------------------------------------------------------
+#
+# First, delete the data (delete will be replicated)
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# Insert a different record (now we are outside of the 2PC tx)
+# FIXME - this works OK but if INSERT overlaps pk of a row participating in the 2PC then it hangs
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (99999, 'foobar')");
+
+# 2PC gets aborted
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+# but the extra INSERT outside of the 2PC still happened
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3|3|3), 'check the outside insert was copied to subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+# --------------------------------------------------------------
+# Do INSERT outside of 2PC but before COMMIT PREPARED.
+#
+# Table is deleted back to 2 rows which are replicated on subscriber.
+# Mass data is streamed using 2PC.
+# A single row INSERT is done which is outside of the 2PC transaction
+# Then there is a commit prepared.
+# Expect 2PC data + the extra row are on the subscriber.
+# --------------------------------------------------------------
+#
+# First, delete the data (delete will be replicated)
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# Insert a different record (now we are outside of the 2PC tx)
+# FIXME - this works OK but if INSERT overlaps pk of a row participating in the 2PC then it hangs
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (99999, 'foobar')");
+
+# 2PC gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3335|3335|3335), 'Rows inserted by 2PC (as well as outside insert) have committed on subscriber, and extra columns contain local defaults');
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+# --------------------------------------------------------------
+# Do DELETE outside of 2PC but before COMMIT PREPARED.
+#
+# Table is deleted back to 2 rows which are replicated on subscriber.
+# Mass data is streamed using 2PC.
+# A single row DELETE is done for one of the records of the 2PC transaction
+# Then there is a commit prepared.
+# Expect all the 2PC data rows on the subscriber (because delete would have failed).
+# --------------------------------------------------------------
+#
+# First, delete the data (delete will be replicated)
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# DELETE one of the prepared 2PC records before they get committed (we are outside of the 2PC tx)
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM test_tab WHERE a = 5");
+
+# 2PC gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber. Nothing was deleted');
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM test_tab WHERE a = 5");
+is($result, qq(1), 'The row we deleted before the commit till exists on subscriber.');
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+# --------------------------------------------------------------
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
Hello Ajin.
I have gone through the v10 patches to verify if and how my previous
v6 review comments got addressed.
Some issues remain, and there are a few newly introduced ones.
Mostly it is all very minor stuff.
Please find my revised review comments below.
Kind Regards.
Peter Smith
Fujitsu Australia
---
V10 REVIEW COMMENTS FOLLOW
==========
Patch v10-0001, File: contrib/test_decoding/test_decoding.c
==========
COMMENT
Line 285
+ {
+ errno = 0;
+ data->check_xid_aborted = (TransactionId)
+ strtoul(strVal(elem->arg), NULL, 0);
+
+ if (!TransactionIdIsValid(data->check_xid_aborted))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted is not a valid xid: \"%s\"",
+ strVal(elem->arg))));
+ }
I think it is risky to assign strtoul directly to the
check_xid_aborted member because it makes some internal assumption
that the invalid transaction is the same as the error return from
strtoul.
Maybe better to do in 2 steps like below:
BEFORE
errno = 0;
data->check_xid_aborted = (TransactionId)strtoul(strVal(elem->arg), NULL, 0);
AFTER
long xid;
errno = 0;
xid = strtoul(strVal(elem->arg), NULL, 0);
if (xid == 0 || errno != 0)
data->check_xid_aborted = InvalidTransactionId;
else
data->check_xid_aborted =(TransactionId)xid;
---
COMMENT
Line 430
+
+/* ABORT PREPARED callback */
+static void
+pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
Fix comment "ABORT PREPARED" --> "ROLLBACK PREPARED"
==========
Patch v10-0001, File: doc/src/sgml/logicaldecoding.sgml
==========
COMMENT
Section 48.6.1
Says:
An output plugin may also define functions to support streaming of
large, in-progress transactions. The stream_start_cb, stream_stop_cb,
stream_abort_cb, stream_commit_cb and stream_change_cb are required,
while stream_message_cb, stream_prepare_cb, stream_commit_prepared_cb,
stream_rollback_prepared_cb and stream_truncate_cb are optional.
An output plugin may also define functions to support two-phase
commits, which are decoded on PREPARE TRANSACTION. The prepare_cb,
commit_prepared_cb and rollback_prepared_cb callbacks are required,
while filter_prepare_cb is optional.
-
But is that correct? It seems strange/inconsistent to say that the 2PC
callbacks are mandatory for the non-streaming, but that they are
optional for streaming.
---
COMMENT
48.6.4.5 "Transaction Prepare Callback"
48.6.4.6 "Transaction Commit Prepared Callback"
48.6.4.7 "Transaction Rollback Prepared Callback"
There seems some confusion about what is optional and what is
mandatory. e.g. Why are the non-stream 2PC callbacks mandatory but the
stream 2PC callbacks are not? And also there is some inconsistency
with what is said in the paragraph at the top of the page versus what
each of the callback sections says wrt optional/mandatory.
The sub-sections 49.6.4.5, 49.6.4.6, 49.6.4.7 say those callbacks are
optional which IIUC Amit said is incorrect. This is similar to the
previous review comment
---
COMMENT
Section 48.6.4.7 "Transaction Rollback Prepared Callback"
parameter "abort_lsn" probably should be "rollback_lsn"
---
COMMENT
Section 49.6.4.18. "Stream Rollback Prepared Callback"
Says:
The stream_rollback_prepared_cb callback is called to abort a
previously streamed transaction as part of a two-phase commit.
maybe should say "is called to rollback"
==========
Patch v10-0001, File: src/backend/replication/logical/logical.c
==========
COMMENT
Line 252
Says: We however enable two phase logical...
"two phase" --> "two-phase"
--
COMMENT
Line 885
Line 923
Says: If the plugin support 2 phase commits...
"support 2 phase" --> "supports two-phase" in the comment. Same issue
occurs twice.
---
COMMENT
Line 830
Line 868
Line 906
Says:
/* We're only supposed to call this when two-phase commits are supported */
There is an extra space between the "are" and "supported" in the comment.
Same issue occurs 3 times.
---
COMMENT
Line 1023
+ /*
+ * Skip if decoding of two-phase at PREPARE time is not enabled. In that
+ * case all two-phase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
Comment still is missing the word "transactions"
"Skip if decoding of two-phase at PREPARE time is not enabled."
-> "Skip if decoding of two-phase transactions at PREPARE time is not enabled.
==========
Patch v10-0001, File: src/include/replication/reorderbuffer.h
==========
COMMENT
Line 459
/* abort prepared callback signature */
typedef void (*ReorderBufferRollbackPreparedCB) (
ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
There is no alignment consistency here for
ReorderBufferRollbackPreparedCB. Some function args are directly under
the "(" and some are on the same line. This function code is neither.
---
COMMENT
Line 638
@@ -431,6 +486,24 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamCommitPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamRollbackPreparedCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
There is no inconsistent alignment with the arguments (compare how
other functions are aligned)
See:
- for ReorderBufferStreamCommitPreparedCB
- for ReorderBufferStreamRollbackPreparedCB
- for ReorderBufferPrepareNeedSkip
- for ReorderBufferTxnIsPrepared
- for ReorderBufferPrepare
---
COMMENT
Line 489
Line 495
Line 501
/* prepare streamed transaction callback signature */
Same comment cut/paste 3 times?
- for ReorderBufferStreamPrepareCB
- for ReorderBufferStreamCommitPreparedCB
- for ReorderBufferStreamRollbackPreparedCB
---
COMMENT
Line 457
/* abort prepared callback signature */
typedef void (*ReorderBufferRollbackPreparedCB) (
ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
"abort" --> "rollback" in the function comment.
---
COMMENT
Line 269
/* In case of 2PC we need to pass GID to output plugin */
"2PC" --> "two-phase commit"
==========
Patch v10-0002, File: contrib/test_decoding/expected/two_phase.out (and .sql)
==========
COMMENT
General
It is a bit hard to see what are the main tests here are what are just
sub-parts of some test case.
e.g. It seems like the main tests are.
1. Test that decoding happens at PREPARE time
2. Test decoding of an aborted tx
3. Test a prepared tx which contains some DDL
4. Test decoding works while an uncommitted prepared tx with DDL exists
5. Test operations holding exclusive locks won't block decoding
6. Test savepoints and sub-transactions
7. Test "_nodecode" will defer the decoding until the commit time
Can the comments be made more obvious so it is easy to distinguish the
main tests from the steps of those tests?
---
COMMENT
Line 1
-- Test two-phased transactions, when two-phase-commit is enabled,
transactions are
-- decoded at PREPARE time rather than at COMMIT PREPARED time.
Some commas to be removed and this comment to be split into several sentences.
---
COMMENT
Line 19
-- should show nothing
Comment could be more informative. E.g. "Should show nothing because
the PREPARE has not happened yet"
---
COMMENT
Line 77
Looks like there is a missing comment about here that should say
something like "Show that the DDL does not appear in the decoding"
---
COMMENT
Line 160
-- test savepoints and sub-xacts as a result
The subsequent test is testing savepoints. But is it testing sub
transactions like the comment says?
==========
Patch v10-0002, File: contrib/test_decoding/t/001_twophase.pl
==========
COMMENT
General
I think basically there are only 2 tests in this file.
1. to check that the concurrent abort works.
2. to check that the prepared tx can span a server shutdown/restart
But the tests comments do not make this clear at all.
e.g. All the "#" comments look equally important although most of them
are just steps of each test case.
Can the comments be better to distinguish the tests versus the steps
of each test?
==========
Patch v10-0002, File: src/backend/replication/logical/decode.c
==========
COMMENT
Line 71
static void DecodeCommitPrepared(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
static void DecodeAbortPrepared(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_prepare * parsed);
The 2nd line or args are not aligned properly.
- for DecodeCommitPrepared
- for DecodeAbortPrepared
- for DecodePrepare
==========
Patch v10-0002, File: src/backend/replication/logical/reorderbuffer.c
==========
COMMENT
There are some parts of the code where in my v6 review I had a doubt
about the mutually exclusive treatment of the "streaming" flag and the
"rbtxn_prepared(txn)" state.
Basically I did not see how some parts of the code are treating NOT
streaming as implying 2PC etc because it defies my understanding that
2PC can also work in streaming mode. Perhaps the "streaming" flag has
a different meaning to how I interpret it? Or perhaps some functions
are guarding higher up and can only be called under certain
conditions?
Anyway, this confusion manifests in several parts of the code, none of
which was changed after my v6 review.
Affected code includes the following:
CASE 1
Wherever the ReorderBufferTruncateTXN(...) "prepared" flag (third
parameter) is hardwired true/false, I think there must be some
preceding Assert to guarantee the prepared state condition holds true.
There can't be any room for doubts like "but what will it do for
streamed 2PC..."
Line 1805 - ReorderBufferTruncateTXN(rb, txn, true); // if rbtxn_prepared(txn)
Line 1941 - ReorderBufferTruncateTXN(rb, txn, false); // state ??
Line 2389 - ReorderBufferTruncateTXN(rb, txn, false); // if streaming
Line 2396 - ReorderBufferTruncateTXN(rb, txn, true); // if not
streaming and if rbtxm_prepared(txn)
Line 2459 - ReorderBufferTruncateTXN(rb, txn, true); // if not streaming
~
CASE 2
Wherever the "streaming" flag is tested I don't really understand how
NOT streaming can automatically imply 2PC.
Line 2330 - if (streaming) // what about if it is streaming AND 2PC at
the same time?
Line 2387 - if (streaming) // what about if it is streaming AND 2PC at
the same time?
Line 2449 - if (streaming) // what about if it is streaming AND 2PC at
the same time?
~
Case 1 and Case 2 above overlap a fair bit. I just listed them so they
all get checked again.
Even if the code is thought to be currently OK I do still think
something should be done like:
a) add some more substantial comments to explain WHY the combination
of streaming and 2PC is not valid in the context
b) the Asserts to be strengthened to 100% guarantee that the streaming
and prepared states really are exclusive (if indeed they are). For
this point I thought the following Assert condition could be better:
Assert(streaming || rbtxn_prepared(txn));
Assert(stream_started || rbtxn_prepared(txn));
because as it is you still are left wondering if both streaming AND
rbtxn_prepared(txn) can be possible at the same time...
---
COMMENT
Line 2634
* Anyways, two-phase transactions do not contain any reorderbuffers.
"Anyways" --> "Anyway"
==========
Patch v10-0003, File: src/backend/access/transam/twophase.c
==========
COMMENT
Line 557
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+ bool found = false;
The variable declarations (i and found) are not aligned.
==========
Patch v10-0003, File: src/backend/replication/logical/proto.c
==========
COMMENT
Line 125
Line 205
Assert(strlen(txn->gid) > 0);
I suggested that the assertion should also check txn->gid is not NULL.
You replied "In this case txn->gid has to be non NULL".
But that is exactly what I said :-)
If it HAS to be non-NULL then why not just Assert that in code instead
of leaving the reader wondering?
"Assert(strlen(txn->gid) > 0);" --> "Assert(tdx->gid && strlen(txn->gid) > 0);"
Same occurs several times.
---
COMMENT
Line 133
Line 213
if (rbtxn_commit_prepared(txn))
flags |= LOGICALREP_IS_COMMIT_PREPARED;
else if (rbtxn_rollback_prepared(txn))
flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
else
flags |= LOGICALREP_IS_PREPARE;
Previously I wrote that the use of the bit flags on assignment in the
logicalrep_write_prepare was inconsistent with the way they are
treated when they are read. Really it should be using a direct
assignment instead of bit flags.
You said this is skipped anticipating a possible refactor. But IMO
this leaves the code in a half/half state. I think it is better to fix
it properly and if refactoring happens then deal with that at the
time.
The last comment I saw from Amit said to use my 1st proposal of direct
assignment instead of bit flag assignment.
(applies to both non-stream and stream functions)
- see logicalrep_write_prepare
- see logicalrep_write_stream_prepare
==========
Patch v10-0003, File: src/backend/replication/pgoutput/pgoutput.c
==========
COMMENT
Line 429
/*
* PREPARE callback
*/
static void
pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn)
The function comment looks wrong.
Shouldn't this comment say be "ROLLBACK PREPARED callback"?
==========
Patch v10-0003, File: src/include/replication/logicalproto.h
==========
Line 115
#define PrepareFlagsAreValid(flags) \
((flags == LOGICALREP_IS_PREPARE) || \
(flags == LOGICALREP_IS_COMMIT_PREPARED) || \
(flags == LOGICALREP_IS_ROLLBACK_PREPARED))
Would be safer if all the references to flags are in parentheses
e.g. "flags" --> "(flags)"
[END]
On Fri, Oct 16, 2020 at 5:21 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hello Ajin,
The v9 patches provided support for two-phase transactions for NON-streaming.
Now I have added STREAM support for two-phase transactions, and bumped
all patches to version v10.(The 0001 and 0002 patches are unchanged. Only 0003 is changed).
--
There are a few TODO/FIXME comments in the code highlighting parts
needing some attention.There is a #define DEBUG_STREAM_2PC useful for debugging, which I can
remove later.All the patches have some whitespaces issues when applied. We can
resolve them as we go.Please let me know any comments/feedback.
Hi Peter,
Thanks for your patch. Some comments for your patch:
Comments:
src/backend/replication/logical/worker.c
@@ -888,6 +888,319 @@ apply_handle_prepare(StringInfo s)
+ /*
+ * FIXME - Following condition was in apply_handle_prepare_txn except
I found it was ALWAYS IsTransactionState() == false
+ * The synchronization worker runs in single transaction. *
+ if (IsTransactionState() && !am_tablesync_worker())
+ */
+ if (!am_tablesync_worker())
Comment: I dont think a tablesync worker will use streaming, none of
the other stream APIs check this, this might not be relevant for
stream_prepare either.
+ /*
+ * ==================================================================================================
+ * The following chunk of code is largely cut/paste from the existing
apply_handle_prepare_commit_txn
Comment: Here, I think you meant apply_handle_stream_commit. Also
rather than duplicating this chunk of code, you could put it in a new
function.
+ /* open the spool file for the committed transaction */
+ changes_filename(path, MyLogicalRepWorker->subid, xid);
Comment: Here the comment should read "committed/prepared" rather than
"committed"
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
Comment: This else block might not be necessary as a tablesync worker
will not initiate the streaming APIs.
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
Comment: Rereading the code and the transaction state description in
src/backend/access/transam/README. I am not entirely sure if the
BeginTransactionBlock followed by CommitTransactionBlock is really
needed here.
I understand this code was copied over from apply_handle_prepare_txn,
but now looking back I'm not so sure if it is correct. The transaction
would have already begin as part of applying the changes, why begin it
again?
Maybe Amit could confirm this.
END
regards,
Ajin Cherian
Fujitsu Australia
The PG docs for PREPARE TRANSACTION [1]https://www.postgresql.org/docs/current/sql-prepare-transaction.html don't say anything about an
empty (zero length) transaction-id.
e.g. PREPARE TRANSACTION '';
[1]: https://www.postgresql.org/docs/current/sql-prepare-transaction.html
~
Meanwhile, during testing I found the 2PC prepare hangs when an empty
id is used.
Now I am not sure does this represent some bug within the 2PC code, or
in fact should the PREPARE never have allowed an empty transaction-id
to be specified in the first place?
Thoughts?
Kind Regards
Peter Smith.
Fujitsu Australia.
On Tue, Oct 20, 2020 at 4:32 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Fri, Oct 16, 2020 at 5:21 PM Peter Smith <smithpb2250@gmail.com> wrote:
Comments:
src/backend/replication/logical/worker.c @@ -888,6 +888,319 @@ apply_handle_prepare(StringInfo s) + /* + * FIXME - Following condition was in apply_handle_prepare_txn except I found it was ALWAYS IsTransactionState() == false + * The synchronization worker runs in single transaction. * + if (IsTransactionState() && !am_tablesync_worker()) + */ + if (!am_tablesync_worker())Comment: I dont think a tablesync worker will use streaming, none of
the other stream APIs check this, this might not be relevant for
stream_prepare either.
Yes, I think this is right. See pgoutput_startup where we are
disabling the streaming for init phase. But it is always good to once
test this and ensure the same.
+ /* + * ================================================================================================== + * The following chunk of code is largely cut/paste from the existing apply_handle_prepare_commit_txnComment: Here, I think you meant apply_handle_stream_commit. Also
rather than duplicating this chunk of code, you could put it in a new
function.+ /* open the spool file for the committed transaction */ + changes_filename(path, MyLogicalRepWorker->subid, xid);Comment: Here the comment should read "committed/prepared" rather than
"committed"+ else + { + /* Process any invalidation messages that might have accumulated. */ + AcceptInvalidationMessages(); + maybe_reread_subscription(); + }Comment: This else block might not be necessary as a tablesync worker
will not initiate the streaming APIs.
I think it is better to have an Assert here for streaming-mode?
+ BeginTransactionBlock(); + CommitTransactionCommand(); + StartTransactionCommand();Comment: Rereading the code and the transaction state description in
src/backend/access/transam/README. I am not entirely sure if the
BeginTransactionBlock followed by CommitTransactionBlock is really
needed here.
Yeah, I also find this strange. I guess the patch is doing so because
it needs to call PrepareTransactionBlock later but I am not sure. How
can we call CommitTransactionCommand(), won't it commit the on-going
transaction and make it visible before even it is visible on the
publisher. I think you can verify by having a breakpoint after
CommitTransactionCommand() and see if the changes for which we are
doing prepare become visible.
I understand this code was copied over from apply_handle_prepare_txn,
but now looking back I'm not so sure if it is correct. The transaction
would have already begin as part of applying the changes, why begin it
again?
Maybe Amit could confirm this.
I hope the above suggestions will help to proceed here.
--
With Regards,
Amit Kapila.
On Wed, Oct 21, 2020 at 1:38 PM Peter Smith <smithpb2250@gmail.com> wrote:
The PG docs for PREPARE TRANSACTION [1] don't say anything about an
empty (zero length) transaction-id.
e.g. PREPARE TRANSACTION '';
[1] https://www.postgresql.org/docs/current/sql-prepare-transaction.html~
Meanwhile, during testing I found the 2PC prepare hangs when an empty
id is used.
Can you please take an example to explain what you are trying to say?
I have tried below and doesn't face any problem:
postgres=# Begin;
BEGIN
postgres=*# select txid_current();
txid_current
--------------
534
(1 row)
postgres=*# Prepare Transaction 'foo';
PREPARE TRANSACTION
postgres=# Commit Prepared 'foo';
COMMIT PREPARED
postgres=# Begin;
BEGIN
postgres=*# Prepare Transaction 'foo';
PREPARE TRANSACTION
postgres=# Commit Prepared 'foo';
COMMIT PREPARED
--
With Regards,
Amit Kapila.
On Tue, Oct 20, 2020 at 9:46 AM Peter Smith <smithpb2250@gmail.com> wrote:
==========
Patch v10-0002, File: src/backend/replication/logical/reorderbuffer.c
==========COMMENT
There are some parts of the code where in my v6 review I had a doubt
about the mutually exclusive treatment of the "streaming" flag and the
"rbtxn_prepared(txn)" state.
I am not sure about the exact specifics here but we can always prepare
a transaction that is streamed. I have to raise one more point in this
regard. Why do we need stream_commit_prepared_cb,
stream_rollback_prepared_cb callbacks? Do we need to do something
separate in pgoutput or otherwise for these APIs? If not, can't we use
a non-stream version of these APIs instead? There appears to be a
use-case for stream_prepare_cb which is to apply the existing changes
on subscriber and call prepare but I can't see usecase for the other
two APIs.
One minor comment:
v10-0001-Support-2PC-txn-base
1.
@@ -574,6 +655,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *,
TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId,
TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -597,6 +683,15 @@ void
ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid,
XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuf
I don't think these changes belong to this patch as the definition of
these functions is not part of this patch.
--
With Regards,
Amit Kapila.
On Wed, Oct 21, 2020 at 7:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Oct 21, 2020 at 1:38 PM Peter Smith <smithpb2250@gmail.com> wrote:
The PG docs for PREPARE TRANSACTION [1] don't say anything about an
empty (zero length) transaction-id.
e.g. PREPARE TRANSACTION '';
[1] https://www.postgresql.org/docs/current/sql-prepare-transaction.html~
Meanwhile, during testing I found the 2PC prepare hangs when an empty
id is used.Can you please take an example to explain what you are trying to say?
I was referring to an empty (zero length) transaction ID, not an empty
transaction.
The example was already given as PREPARE TRANSACTION '';
A longer example from my regress test is shown below. Using 2PC
pub/sub this will currently hang:
# --------------------
# Test using empty GID
# --------------------
# check that 2PC gets replicated to subscriber
$node_publisher->safe_psql('postgres',
"BEGIN;INSERT INTO tab_full VALUES (51);PREPARE TRANSACTION '';");
$node_publisher->poll_query_until('postgres', $caughtup_query)
or die "Timed out while waiting for subscriber to catch up";
# check that transaction is in prepared state on subscriber
$result =
$node_subscriber->safe_psql('postgres', "SELECT count(*) FROM
pg_prepared_xacts where gid = '';");
is($result, qq(1), 'transaction is prepared on subscriber');
# ROLLBACK
$node_publisher->safe_psql('postgres',
"ROLLBACK PREPARED '';");
# check that 2PC gets aborted on subscriber
$node_publisher->poll_query_until('postgres', $caughtup_query)
or die "Timed out while waiting for subscriber to catch up";
$result =
$node_subscriber->safe_psql('postgres', "SELECT count(*) FROM
pg_prepared_xacts where gid = '';");
is($result, qq(0), 'transaction is aborted on subscriber');
~
Is that something that should be made to work for 2PC pub/sub, or was
Postgres PREPARE TRANSACTION statement wrong to allow the user to
specify an empty transaction ID in the first place?
Kind Regards
Peter Smith.
Fujitsu Australia.
On Thu, Oct 22, 2020 at 4:58 AM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Oct 21, 2020 at 7:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Oct 21, 2020 at 1:38 PM Peter Smith <smithpb2250@gmail.com> wrote:
The PG docs for PREPARE TRANSACTION [1] don't say anything about an
empty (zero length) transaction-id.
e.g. PREPARE TRANSACTION '';
[1] https://www.postgresql.org/docs/current/sql-prepare-transaction.html~
Meanwhile, during testing I found the 2PC prepare hangs when an empty
id is used.Can you please take an example to explain what you are trying to say?
I was referring to an empty (zero length) transaction ID, not an empty
transaction.
oh, I got it confused with the system generated 32-bit TransactionId.
But now, I got what you were referring to.
The example was already given as PREPARE TRANSACTION '';
Is that something that should be made to work for 2PC pub/sub, or was
Postgres PREPARE TRANSACTION statement wrong to allow the user to
specify an empty transaction ID in the first place?
I don't see any problem with the empty transaction identifier used in
Prepare Transaction. This is just used as an identifier to uniquely
identify the transaction. If you try to use an empty string ('') more
than once for Prepare Transaction, it will give an error like below:
postgres=*# prepare transaction '';
ERROR: transaction identifier "" is already in use
So, I think this should work for pub/sub as well. Did you find out the
reason of hang?
--
With Regards,
Amit Kapila.
On Tue, Oct 20, 2020 at 3:15 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hello Ajin.
I have gone through the v10 patches to verify if and how my previous
v6 review comments got addressed.Some issues remain, and there are a few newly introduced ones.
Mostly it is all very minor stuff.
Please find my revised review comments below.
Kind Regards.
Peter Smith
Fujitsu Australia---
V10 REVIEW COMMENTS FOLLOW
==========
Patch v10-0001, File: contrib/test_decoding/test_decoding.c
==========COMMENT Line 285 + { + errno = 0; + data->check_xid_aborted = (TransactionId) + strtoul(strVal(elem->arg), NULL, 0); + + if (!TransactionIdIsValid(data->check_xid_aborted)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("check-xid-aborted is not a valid xid: \"%s\"", + strVal(elem->arg)))); + }I think it is risky to assign strtoul directly to the
check_xid_aborted member because it makes some internal assumption
that the invalid transaction is the same as the error return from
strtoul.Maybe better to do in 2 steps like below:
BEFORE
errno = 0;
data->check_xid_aborted = (TransactionId)strtoul(strVal(elem->arg), NULL, 0);AFTER
long xid;
errno = 0;
xid = strtoul(strVal(elem->arg), NULL, 0);
if (xid == 0 || errno != 0)
data->check_xid_aborted = InvalidTransactionId;
else
data->check_xid_aborted =(TransactionId)xid;---
Updated accordingly.
COMMENT Line 430 + +/* ABORT PREPARED callback */ +static void +pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, + XLogRecPtr abort_lsn)Fix comment "ABORT PREPARED" --> "ROLLBACK PREPARED"
Updated accordingly.
==========
Patch v10-0001, File: doc/src/sgml/logicaldecoding.sgml
==========COMMENT
Section 48.6.1
Says:
An output plugin may also define functions to support streaming of
large, in-progress transactions. The stream_start_cb, stream_stop_cb,
stream_abort_cb, stream_commit_cb and stream_change_cb are required,
while stream_message_cb, stream_prepare_cb, stream_commit_prepared_cb,
stream_rollback_prepared_cb and stream_truncate_cb are optional.An output plugin may also define functions to support two-phase
commits, which are decoded on PREPARE TRANSACTION. The prepare_cb,
commit_prepared_cb and rollback_prepared_cb callbacks are required,
while filter_prepare_cb is optional.-
But is that correct? It seems strange/inconsistent to say that the 2PC
callbacks are mandatory for the non-streaming, but that they are
optional for streaming.
Updated making all the 2PC callbacks mandatory.
---
COMMENT
48.6.4.5 "Transaction Prepare Callback"
48.6.4.6 "Transaction Commit Prepared Callback"
48.6.4.7 "Transaction Rollback Prepared Callback"There seems some confusion about what is optional and what is
mandatory. e.g. Why are the non-stream 2PC callbacks mandatory but the
stream 2PC callbacks are not? And also there is some inconsistency
with what is said in the paragraph at the top of the page versus what
each of the callback sections says wrt optional/mandatory.The sub-sections 49.6.4.5, 49.6.4.6, 49.6.4.7 say those callbacks are
optional which IIUC Amit said is incorrect. This is similar to the
previous review comment---
Updated making all the 2PC callbacks mandatory.
COMMENT
Section 48.6.4.7 "Transaction Rollback Prepared Callback"parameter "abort_lsn" probably should be "rollback_lsn"
---
COMMENT
Section 49.6.4.18. "Stream Rollback Prepared Callback"
Says:
The stream_rollback_prepared_cb callback is called to abort a
previously streamed transaction as part of a two-phase commit.maybe should say "is called to rollback"
==========
Patch v10-0001, File: src/backend/replication/logical/logical.c
==========COMMENT
Line 252
Says: We however enable two phase logical..."two phase" --> "two-phase"
--
COMMENT
Line 885
Line 923
Says: If the plugin support 2 phase commits..."support 2 phase" --> "supports two-phase" in the comment. Same issue
occurs twice.---
COMMENT
Line 830
Line 868
Line 906
Says:
/* We're only supposed to call this when two-phase commits are supported */There is an extra space between the "are" and "supported" in the comment.
Same issue occurs 3 times.---
COMMENT Line 1023 + /* + * Skip if decoding of two-phase at PREPARE time is not enabled. In that + * case all two-phase transactions are considered filtered out and will be + * applied as regular transactions at COMMIT PREPARED. + */Comment still is missing the word "transactions"
"Skip if decoding of two-phase at PREPARE time is not enabled."
-> "Skip if decoding of two-phase transactions at PREPARE time is not enabled.
Updated accordingly.
==========
Patch v10-0001, File: src/include/replication/reorderbuffer.h
==========COMMENT
Line 459
/* abort prepared callback signature */
typedef void (*ReorderBufferRollbackPreparedCB) (
ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);There is no alignment consistency here for
ReorderBufferRollbackPreparedCB. Some function args are directly under
the "(" and some are on the same line. This function code is neither.---
COMMENT
Line 638
@@ -431,6 +486,24 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);+/* prepare streamed transaction callback signature */ +typedef void (*ReorderBufferStreamPrepareCB) ( + ReorderBuffer *rb, + ReorderBufferTXN *txn, + XLogRecPtr prepare_lsn); + +/* prepare streamed transaction callback signature */ +typedef void (*ReorderBufferStreamCommitPreparedCB) ( + ReorderBuffer *rb, + ReorderBufferTXN *txn, + XLogRecPtr commit_lsn); + +/* prepare streamed transaction callback signature */ +typedef void (*ReorderBufferStreamRollbackPreparedCB) ( + ReorderBuffer *rb, + ReorderBufferTXN *txn, + XLogRecPtr rollback_lsn);There is no inconsistent alignment with the arguments (compare how
other functions are aligned)See:
- for ReorderBufferStreamCommitPreparedCB
- for ReorderBufferStreamRollbackPreparedCB
- for ReorderBufferPrepareNeedSkip
- for ReorderBufferTxnIsPrepared
- for ReorderBufferPrepare---
COMMENT
Line 489
Line 495
Line 501
/* prepare streamed transaction callback signature */Same comment cut/paste 3 times?
- for ReorderBufferStreamPrepareCB
- for ReorderBufferStreamCommitPreparedCB
- for ReorderBufferStreamRollbackPreparedCB---
COMMENT
Line 457
/* abort prepared callback signature */
typedef void (*ReorderBufferRollbackPreparedCB) (
ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);"abort" --> "rollback" in the function comment.
---
COMMENT
Line 269
/* In case of 2PC we need to pass GID to output plugin */"2PC" --> "two-phase commit"
Updated accordingly.
==========
Patch v10-0002, File: contrib/test_decoding/expected/two_phase.out (and .sql)
==========COMMENT
GeneralIt is a bit hard to see what are the main tests here are what are just
sub-parts of some test case.e.g. It seems like the main tests are.
1. Test that decoding happens at PREPARE time
2. Test decoding of an aborted tx
3. Test a prepared tx which contains some DDL
4. Test decoding works while an uncommitted prepared tx with DDL exists
5. Test operations holding exclusive locks won't block decoding
6. Test savepoints and sub-transactions
7. Test "_nodecode" will defer the decoding until the commit timeCan the comments be made more obvious so it is easy to distinguish the
main tests from the steps of those tests?---
COMMENT
Line 1
-- Test two-phased transactions, when two-phase-commit is enabled,
transactions are
-- decoded at PREPARE time rather than at COMMIT PREPARED time.Some commas to be removed and this comment to be split into several sentences.
---
COMMENT
Line 19
-- should show nothingComment could be more informative. E.g. "Should show nothing because
the PREPARE has not happened yet"---
COMMENT
Line 77Looks like there is a missing comment about here that should say
something like "Show that the DDL does not appear in the decoding"---
COMMENT
Line 160
-- test savepoints and sub-xacts as a resultThe subsequent test is testing savepoints. But is it testing sub
transactions like the comment says?
Updated accordingly.
==========
Patch v10-0002, File: contrib/test_decoding/t/001_twophase.pl
==========COMMENT
GeneralI think basically there are only 2 tests in this file.
1. to check that the concurrent abort works.
2. to check that the prepared tx can span a server shutdown/restartBut the tests comments do not make this clear at all.
e.g. All the "#" comments look equally important although most of them
are just steps of each test case.
Can the comments be better to distinguish the tests versus the steps
of each test?
Updated accordingly.
==========
Patch v10-0002, File: src/backend/replication/logical/decode.c
==========COMMENT
Line 71
static void DecodeCommitPrepared(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
static void DecodeAbortPrepared(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_prepare * parsed);The 2nd line or args are not aligned properly.
- for DecodeCommitPrepared
- for DecodeAbortPrepared
- for DecodePrepare
Updated accordingly.
==========
Patch v10-0002, File: src/backend/replication/logical/reorderbuffer.c
==========COMMENT
There are some parts of the code where in my v6 review I had a doubt
about the mutually exclusive treatment of the "streaming" flag and the
"rbtxn_prepared(txn)" state.Basically I did not see how some parts of the code are treating NOT
streaming as implying 2PC etc because it defies my understanding that
2PC can also work in streaming mode. Perhaps the "streaming" flag has
a different meaning to how I interpret it? Or perhaps some functions
are guarding higher up and can only be called under certain
conditions?Anyway, this confusion manifests in several parts of the code, none of
which was changed after my v6 review.Affected code includes the following:
CASE 1
Wherever the ReorderBufferTruncateTXN(...) "prepared" flag (third
parameter) is hardwired true/false, I think there must be some
preceding Assert to guarantee the prepared state condition holds true.
There can't be any room for doubts like "but what will it do for
streamed 2PC..."
Line 1805 - ReorderBufferTruncateTXN(rb, txn, true); // if rbtxn_prepared(txn)
Line 1941 - ReorderBufferTruncateTXN(rb, txn, false); // state ??
Line 2389 - ReorderBufferTruncateTXN(rb, txn, false); // if streaming
Line 2396 - ReorderBufferTruncateTXN(rb, txn, true); // if not
streaming and if rbtxm_prepared(txn)
Line 2459 - ReorderBufferTruncateTXN(rb, txn, true); // if not streaming~
CASE 2
Wherever the "streaming" flag is tested I don't really understand how
NOT streaming can automatically imply 2PC.
Line 2330 - if (streaming) // what about if it is streaming AND 2PC at
the same time?
Line 2387 - if (streaming) // what about if it is streaming AND 2PC at
the same time?
Line 2449 - if (streaming) // what about if it is streaming AND 2PC at
the same time?~
Case 1 and Case 2 above overlap a fair bit. I just listed them so they
all get checked again.Even if the code is thought to be currently OK I do still think
something should be done like:
a) add some more substantial comments to explain WHY the combination
of streaming and 2PC is not valid in the context
b) the Asserts to be strengthened to 100% guarantee that the streaming
and prepared states really are exclusive (if indeed they are). For
this point I thought the following Assert condition could be better:
Assert(streaming || rbtxn_prepared(txn));
Assert(stream_started || rbtxn_prepared(txn));
because as it is you still are left wondering if both streaming AND
rbtxn_prepared(txn) can be possible at the same time...---
Updated with more comments and a new Assert.
COMMENT
Line 2634
* Anyways, two-phase transactions do not contain any reorderbuffers."Anyways" --> "Anyway"
Updated.
==========
Patch v10-0003, File: src/backend/access/transam/twophase.c
==========COMMENT
Line 557
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}/* + * LookupGXact + * Check if the prepared transaction with the given GID is around + */ +bool +LookupGXact(const char *gid) +{ + int i; + bool found = false;The variable declarations (i and found) are not aligned.
Updated.
==========
Patch v10-0003, File: src/backend/replication/logical/proto.c
==========COMMENT
Line 125
Line 205
Assert(strlen(txn->gid) > 0);I suggested that the assertion should also check txn->gid is not NULL.
You replied "In this case txn->gid has to be non NULL".But that is exactly what I said :-)
If it HAS to be non-NULL then why not just Assert that in code instead
of leaving the reader wondering?"Assert(strlen(txn->gid) > 0);" --> "Assert(tdx->gid && strlen(txn->gid) > 0);"
Same occurs several times.---
Updated checking that gid is non-NULL as zero strlen is actually a valid case.
COMMENT
Line 133
Line 213
if (rbtxn_commit_prepared(txn))
flags |= LOGICALREP_IS_COMMIT_PREPARED;
else if (rbtxn_rollback_prepared(txn))
flags |= LOGICALREP_IS_ROLLBACK_PREPARED;
else
flags |= LOGICALREP_IS_PREPARE;Previously I wrote that the use of the bit flags on assignment in the
logicalrep_write_prepare was inconsistent with the way they are
treated when they are read. Really it should be using a direct
assignment instead of bit flags.You said this is skipped anticipating a possible refactor. But IMO
this leaves the code in a half/half state. I think it is better to fix
it properly and if refactoring happens then deal with that at the
time.The last comment I saw from Amit said to use my 1st proposal of direct
assignment instead of bit flag assignment.(applies to both non-stream and stream functions)
- see logicalrep_write_prepare
- see logicalrep_write_stream_prepare
Updated accordingly.
==========
Patch v10-0003, File: src/backend/replication/pgoutput/pgoutput.c
==========COMMENT
Line 429
/*
* PREPARE callback
*/
static void
pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn)
The function comment looks wrong.
Shouldn't this comment say be "ROLLBACK PREPARED callback"?==========
Patch v10-0003, File: src/include/replication/logicalproto.h
==========Line 115
#define PrepareFlagsAreValid(flags) \
((flags == LOGICALREP_IS_PREPARE) || \
(flags == LOGICALREP_IS_COMMIT_PREPARED) || \
(flags == LOGICALREP_IS_ROLLBACK_PREPARED))Would be safer if all the references to flags are in parentheses
e.g. "flags" --> "(flags)"
Updated accordingly.
Amit,
I have also modified the stream callback APIs to not include
stream_commit_prpeared and stream_rollback_prepared, instead use the
non-stream APIs for the same functionality.
I have also updated the test_decoding and pgoutput plugins accordingly.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v11-0003-Support-2PC-txn-pgoutput.patchapplication/octet-stream; name=v11-0003-Support-2PC-txn-pgoutput.patchDownload
From e7b21894ccd62ecb6c447ffb370dc9c2184fd55f Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 23 Oct 2020 05:37:43 -0400
Subject: [PATCH v11] Support 2PC txn - pgoutput
This patch adds support in the pgoutput plugin and subscriber for handling
of two-phase commits.
Includes pgoutput changes.
Includes subscriber changes.
Includes two-phase commit test code (streaming and not streaming).
---
src/backend/access/transam/twophase.c | 27 ++
src/backend/replication/logical/proto.c | 162 +++++++++-
src/backend/replication/logical/worker.c | 364 ++++++++++++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 79 ++++-
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 39 +++
src/test/subscription/t/020_twophase.pl | 163 ++++++++++
src/test/subscription/t/021_twophase_streaming.pl | 366 ++++++++++++++++++++++
8 files changed, 1198 insertions(+), 3 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
create mode 100644 src/test/subscription/t/021_twophase_streaming.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060..2e0a408 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+ bool found = false;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (gxact->valid && strcmp(gxact->gid, gid) == 0)
+ {
+ found = true;
+ break;
+ }
+
+ }
+ LWLockRelease(TwoPhaseStateLock);
+ return found;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..691ef0c 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -78,7 +78,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, 'C'); /* sending COMMIT */
- /* send the flags field (unused for now) */
+ /* send the flags field */
pq_sendbyte(out, flags);
/* send fields */
@@ -99,6 +99,7 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
if (flags != 0)
elog(ERROR, "unrecognized flags %u in commit message", flags);
+
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
commit_data->end_lsn = pq_getmsgint64(in);
@@ -106,6 +107,165 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for two-phase commit transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(txn->gid != NULL);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags = LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags = LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags = LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
+ * Write STREAM PREPARE to the output stream.
+ *
+ * (For stream PREPARE, stream COMMIT PREPARED, stream ROLLBACK PREPARED)
+ *
+ * TODO- This is mostly cut/paste from logicalrep_write_prepare. Consider refactoring for commonality.
+ */
+void
+logicalrep_write_stream_prepare(StringInfo out,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "proto: logicalrep_write_stream_prepare");
+#endif
+
+ pq_sendbyte(out, 'p'); /* sending STREAM PREPARE protocol */
+
+ /*
+ * This should only ever happen for two-phase transactions. In which case we
+ * expect to have a non-empty GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(txn->gid != NULL);
+
+ /*
+ * For streaming APIs only PREPARE is supported. [COMMIT|ROLLBACK] PREPARED
+ * uses non-streaming APIs
+ */
+ flags = LOGICALREP_IS_PREPARE;
+
+ /* transaction ID */
+ Assert(TransactionIdIsValid(txn->xid));
+ pq_sendint32(out, txn->xid);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read STREAM PREPARE from the output stream.
+ *
+ * (For stream PREPARE, stream COMMIT PREPARED, stream ROLLBACK PREPARED)
+ *
+ * TODO - This is mostly cut/paste from logicalrep_read_prepare. Consider refactoring for commonality.
+ */
+TransactionId
+logicalrep_read_stream_prepare(StringInfo in, LogicalRepPrepareData *prepare_data)
+{
+ TransactionId xid;
+ uint8 flags;
+
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "proto: logicalrep_read_stream_prepare");
+#endif
+
+ xid = pq_getmsgint(in, 4);
+
+ /* read flags */
+ flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+
+ return xid;
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 3a5b733..26ad790 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -722,6 +722,7 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_timestamp = commit_data.committime;
CommitTransactionCommand();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -742,6 +743,359 @@ apply_handle_commit(StringInfo s)
}
/*
+ * Called from apply_handle_prepare to handle a PREPARE TRANSACTION.
+ */
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a COMMIT PREPARED of a previously
+ * PREPARED transaction.
+ */
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a ROLLBACK PREPARED of a previously
+ * PREPARED TRANSACTION.
+ */
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
+/*
+ * Handle STREAM PREPARE message.
+ */
+static void
+apply_handle_stream_prepare(StringInfo s)
+
+{
+ StringInfoData s2;
+ int nchanges;
+ char path[MAXPGPATH];
+ char *buffer = NULL;
+ bool found;
+ StreamXidHash *ent;
+ MemoryContext oldcxt;
+ BufFile *fd;
+ LogicalRepPrepareData prepare_data;
+ TransactionId xid;
+
+ xid = logicalrep_read_stream_prepare(s, &prepare_data);
+
+ /* This should be a PREPARE. COMMIT PREPARED and ROLLBACK PREPARED should
+ * result in non-streaming APIs being called.
+ */
+ Assert(prepare_data.prepare_type == LOGICALREP_IS_PREPARE);
+
+
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: apply_handle_stream_prepare");
+#endif
+
+ /*
+ * FIXME - Following condition was in apply_handle_prepare_txn except I found it was ALWAYS IsTransactionState() == false
+ * The synchronization worker runs in single transaction. *
+ if (IsTransactionState() && !am_tablesync_worker())
+ */
+ if (!am_tablesync_worker())
+ {
+ /*
+ * ==================================================================================================
+ * The following chunk of code is largely cut/paste from the existing apply_handle_prepare_commit_txn
+ * which was handling the non-two-phase streaming commit by applying the operations of the spooled file.
+ *
+ * Differences are:
+ * - Here the xid is known already because apply_handle_stream_prepare already called
+ * locicalrep_read_stream_prepare
+ *
+ * TODO - This is possible candidate for refactoring since so much of it is the same.
+ * ==================================================================================================
+ */
+
+ Assert(!in_streamed_transaction);
+
+ elog(DEBUG1, "received prepare for streamed transaction %u", xid);
+
+ ensure_transaction();
+
+ /*
+ * Allocate file handle and memory required to process all the messages in
+ * TopTransactionContext to avoid them getting reset after each message is
+ * processed.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /* open the spool file for the committed transaction */
+ changes_filename(path, MyLogicalRepWorker->subid, xid);
+ elog(DEBUG1, "replaying changes from file \"%s\"", path);
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: replaying changes from file \"%s\"", path);
+#endif
+ ent = (StreamXidHash *) hash_search(xidhash,
+ (void *) &xid,
+ HASH_FIND,
+ &found);
+ Assert(found);
+ fd = BufFileOpenShared(ent->stream_fileset, path, O_RDONLY);
+
+ buffer = palloc(BLCKSZ);
+ initStringInfo(&s2);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ remote_final_lsn = prepare_data.prepare_lsn;
+
+ /*
+ * Make sure the handle apply_dispatch methods are aware we're in a remote
+ * transaction.
+ */
+ in_remote_transaction = true;
+ pgstat_report_activity(STATE_RUNNING, NULL);
+
+ /*
+ * Read the entries one by one and pass them through the same logic as in
+ * apply_dispatch.
+ */
+ nchanges = 0;
+ while (true)
+ {
+ int nbytes;
+ int len;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* read length of the on-disk record */
+ nbytes = BufFileRead(fd, &len, sizeof(len));
+
+ /* have we reached end of the file? */
+ if (nbytes == 0)
+ break;
+
+ /* do we have a correct length? */
+ if (nbytes != sizeof(len))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read from streaming transaction's changes file \"%s\": %m",
+ path)));
+
+ Assert(len > 0);
+
+ /* make sure we have sufficiently large buffer */
+ buffer = repalloc(buffer, len);
+
+ /* and finally read the data into the buffer */
+ if (BufFileRead(fd, buffer, len) != len)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read from streaming transaction's changes file \"%s\": %m",
+ path)));
+
+ /* copy the buffer to the stringinfo and call apply_dispatch */
+ resetStringInfo(&s2);
+ appendBinaryStringInfo(&s2, buffer, len);
+
+ /* Ensure we are reading the data into our memory context. */
+ oldcxt = MemoryContextSwitchTo(ApplyMessageContext);
+
+ apply_dispatch(&s2);
+
+ MemoryContextReset(ApplyMessageContext);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ nchanges++;
+
+ if (nchanges % 1000 == 0)
+ elog(DEBUG1, "replayed %d changes from file '%s'",
+ nchanges, path);
+ }
+
+ BufFileClose(fd);
+
+ pfree(buffer);
+ pfree(s2.data);
+
+ /*
+ * ==================================================================================================
+ * The following chunk of code is cut/paste from the existing apply_handle_prepare_txn
+ * which was handling the two-phase prepare of the non-streamed tx
+ * ==================================================================================================
+ */
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: call PrepareTransactionBlock()");
+#endif
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data.end_lsn;
+ replorigin_session_origin_timestamp = prepare_data.preparetime;
+
+ PrepareTransactionBlock(prepare_data.gid);
+ /* End of copied prepare code */
+
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data.end_lsn);
+
+ elog(DEBUG1, "replayed %d (all) changes from file \"%s\"", nchanges, path);
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "worker: replayed %d (all) changes from file \"%s\"", nchanges, path);
+#endif
+
+ /* ============================================================================================== */
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data.end_lsn);
+
+ /* unlink the files with serialized changes and subxact info */
+ /* FIXME - OK to do this here (outside of if/else). */
+ stream_cleanup_files(MyLogicalRepWorker->subid, xid);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
* Handle ORIGIN message.
*
* TODO, support tracking of multiple origins
@@ -1905,10 +2259,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
@@ -1953,6 +2311,10 @@ apply_dispatch(StringInfo s)
case 'c':
apply_handle_stream_commit(s);
break;
+ /* STREAM PREPARE */
+ case 'p':
+ apply_handle_stream_prepare(s);
+ break;
default:
ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 9c997ae..32fe5c5 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -57,7 +63,8 @@ static void pgoutput_stream_abort(struct LogicalDecodingContext *ctx,
static void pgoutput_stream_commit(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
-
+static void pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static bool publications_valid;
static bool in_streaming;
@@ -143,6 +150,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->rollback_prepared_cb = pgoutput_rollback_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -153,6 +164,8 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_commit_cb = pgoutput_stream_commit;
cb->stream_change_cb = pgoutput_change;
cb->stream_truncate_cb = pgoutput_truncate;
+ /* transaction streaming - two-phase commit */
+ cb->stream_prepare_cb = pgoutput_stream_prepare_txn;
}
static void
@@ -378,6 +391,48 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * ROLLBACK PREPARED callback
+ */
+static void
+pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
* Write the current schema of the relation and its ancestor (if any) if not
* done yet.
*/
@@ -857,6 +912,28 @@ pgoutput_stream_commit(struct LogicalDecodingContext *ctx,
}
/*
+ * PREPARE callback (for streaming two-phase commit).
+ *
+ * Notify the downstream to prepare the transaction.
+ */
+static void
+pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+#ifdef DEBUG_STREAM_2PC
+ elog(LOG, "pgoutput: pgoutput_stream_prepare_txn");
+#endif
+
+ Assert(rbtxn_is_streamed(txn));
+
+ OutputPluginUpdateProgress(ctx);
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_stream_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
* Initialize the relation schema sync cache for a decoding session.
*
* The hash table is destroyed at the end of a decoding session. While
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0c2cda2..ae59c04 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -87,6 +87,7 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
XLogRecPtr commit_lsn;
@@ -94,6 +95,28 @@ typedef struct LogicalRepCommitData
TimestampTz committime;
} LogicalRepCommitData;
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ROLLBACK] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ (((flags) == LOGICALREP_IS_PREPARE) || \
+ ((flags) == LOGICALREP_IS_COMMIT_PREPARED) || \
+ ((flags) == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
@@ -101,6 +124,10 @@ extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
@@ -144,4 +171,16 @@ extern void logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
extern void logicalrep_read_stream_abort(StringInfo in, TransactionId *xid,
TransactionId *subxid);
+/*
+ * FIXME - Uncomment this to see more lgging for streamed two-phase transactions.
+ *
+ * #define DEBUG_STREAM_2PC
+ */
+
+extern void logicalrep_write_stream_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern TransactionId logicalrep_read_stream_prepare(StringInfo in,
+ LogicalRepPrepareData *prepare_data);
+
+
#endif /* LOGICAL_PROTO_H */
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..3feb2c3
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,163 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+ ));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf(
+ 'postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+"ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2"
+);
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (11);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 11;");
+ is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is committed on subscriber');
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres',
+ "BEGIN;INSERT INTO tab_full VALUES (12);PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a = 12;");
+ is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+ is($result, qq(0), 'transaction is aborted on subscriber');
+
+# Check that commit prepared is decoded properly on crash restart
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_full where a IN (12,13);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
diff --git a/src/test/subscription/t/021_twophase_streaming.pl b/src/test/subscription/t/021_twophase_streaming.pl
new file mode 100644
index 0000000..391eb17
--- /dev/null
+++ b/src/test/subscription/t/021_twophase_streaming.pl
@@ -0,0 +1,366 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 23;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_publisher->append_conf('postgresql.conf', qq(logical_decoding_work_mem = 64kB));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher (uses same DDL as 015_stream test)
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE test_tab (a int primary key, b varchar)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");
+
+# Setup structure on subscriber (columns a and b are compatible with same table name on publisher)
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+
+# Setup logical replication (streaming = on)
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = on)");
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+ "SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+ "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+my $result =
+ $node_subscriber->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check initial data was copied to subscriber');
+
+# --------------------------------------------------------------
+# 2PC PREPARE / COMMIT PREPARED test.
+#
+# Mass data is streamed as a 2PC transaction.
+# Then there is a commit prepared.
+# Expect all data is replicated on subscriber side after the commit.
+# --------------------------------------------------------------
+#
+# check that 2PC gets replicated to subscriber
+# Insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# 2PC gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber, and extra columns contain local defaults');
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+# --------------------------------------------------------------
+# 2PC PREPARE / ROLLBACK PREPARED test.
+#
+# Table is deleted back to 2 rows which are replicated on subscriber.
+# Mass data is streamed using 2PC but then there is a rollback prepared.
+# Expect data rolls back leaving only the 2 rows on the subscriber.
+# --------------------------------------------------------------
+#
+# First, delete the data (delete will be replicated)
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# 2PC gets aborted
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check initial data was copied to subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+# --------------------------------------------------------------
+# Check that 2PC commit prepared is decoded properly on crash restart.
+#
+# insert, update and delete enough rows to exceed the 64kB limit.
+# Then server crashes before the 2PC transaction is committed.
+# After servers are restarted the pending transaction is committed.
+# Expect all data is replicated on subscriber side after the commit.
+# --------------------------------------------------------------
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber, and extra columns contain local defaults');
+
+# --------------------------------------------------------------
+# Do INSERT outside of 2PC but before ROLLBACK PREPARED.
+#
+# Table is deleted back to 2 rows which are replicated on subscriber.
+# Mass data is streamed using 2PC.
+# A single row INSERT is done which is outside of the 2PC transaction
+# Then there is a rollback prepared.
+# Expect 2PC data rolls back leaving only 3 rows on the subscriber.
+# --------------------------------------------------------------
+#
+# First, delete the data (delete will be replicated)
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# Insert a different record (now we are outside of the 2PC tx)
+# FIXME - this works OK but if INSERT overlaps pk of a row participating in the 2PC then it hangs
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (99999, 'foobar')");
+
+# 2PC gets aborted
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+# but the extra INSERT outside of the 2PC still happened
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3|3|3), 'check the outside insert was copied to subscriber');
+
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+# --------------------------------------------------------------
+# Do INSERT outside of 2PC but before COMMIT PREPARED.
+#
+# Table is deleted back to 2 rows which are replicated on subscriber.
+# Mass data is streamed using 2PC.
+# A single row INSERT is done which is outside of the 2PC transaction
+# Then there is a commit prepared.
+# Expect 2PC data + the extra row are on the subscriber.
+# --------------------------------------------------------------
+#
+# First, delete the data (delete will be replicated)
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# Insert a different record (now we are outside of the 2PC tx)
+# FIXME - this works OK but if INSERT overlaps pk of a row participating in the 2PC then it hangs
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (99999, 'foobar')");
+
+# 2PC gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3335|3335|3335), 'Rows inserted by 2PC (as well as outside insert) have committed on subscriber, and extra columns contain local defaults');
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+# --------------------------------------------------------------
+# Do DELETE outside of 2PC but before COMMIT PREPARED.
+#
+# Table is deleted back to 2 rows which are replicated on subscriber.
+# Mass data is streamed using 2PC.
+# A single row DELETE is done for one of the records of the 2PC transaction
+# Then there is a commit prepared.
+# Expect all the 2PC data rows on the subscriber (because delete would have failed).
+# --------------------------------------------------------------
+#
+# First, delete the data (delete will be replicated)
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# DELETE one of the prepared 2PC records before they get committed (we are outside of the 2PC tx)
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM test_tab WHERE a = 5");
+
+# 2PC gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber. Nothing was deleted');
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM test_tab WHERE a = 5");
+is($result, qq(1), 'The row we deleted before the commit till exists on subscriber.');
+$result =
+ $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+# --------------------------------------------------------------
+
+# TODO add test cases involving DDL. This can be added after we add functionality
+# to replicate DDL changes to subscriber.
+
+# check all the cleanup
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
v11-0001-Support-2PC-txn-base.patchapplication/octet-stream; name=v11-0001-Support-2PC-txn-base.patchDownload
From 79c222f842c3faa23c6d729fc6cb66fcf50af34d Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 23 Oct 2020 04:29:05 -0400
Subject: [PATCH v11] Support-2PC-txn-base
Until now two-phase transaction commands were translated into regular transactions
on the subscriber, and the GID was not forwarded to it. None of the two-phase semantics
were communicated to the subscriber.
This patch provides infrastructure for logical decoding plugins to be informed of
two-phase commands Like PREPARE TRANSACTION, COMMIT PREPARED
and ROLLBACK PREPARED commands with the corresponding GID.
Include logical decoding plugin API infrastructure changes.
Includes contrib/test_decoding changes.
Includes documentation changes.
---
contrib/test_decoding/test_decoding.c | 192 +++++++++++++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 144 ++++++++++++++++++-
src/backend/replication/logical/logical.c | 228 ++++++++++++++++++++++++++++++
src/include/replication/logical.h | 5 +
src/include/replication/output_plugin.h | 46 ++++++
src/include/replication/reorderbuffer.h | 76 ++++++++++
6 files changed, 684 insertions(+), 7 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 8e33614..7eb13ce 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid_aborted; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -73,6 +78,9 @@ static void pg_decode_stream_stop(LogicalDecodingContext *ctx,
static void pg_decode_stream_abort(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
static void pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
@@ -88,6 +96,18 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
void
_PG_init(void)
@@ -112,10 +132,15 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_start_cb = pg_decode_stream_start;
cb->stream_stop_cb = pg_decode_stream_stop;
cb->stream_abort_cb = pg_decode_stream_abort;
+ cb->stream_prepare_cb = pg_decode_stream_prepare;
cb->stream_commit_cb = pg_decode_stream_commit;
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->rollback_prepared_cb = pg_decode_rollback_prepared_txn;
}
@@ -127,6 +152,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
ListCell *option;
TestDecodingData *data;
bool enable_streaming = false;
+ bool enable_2pc = false;
data = palloc0(sizeof(TestDecodingData));
data->context = AllocSetContextCreate(ctx->context,
@@ -136,6 +162,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid_aborted = InvalidTransactionId;
ctx->output_plugin_private = data;
@@ -227,6 +254,40 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
+ else if (!parse_bool(strVal(elem->arg), &enable_2pc))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse value \"%s\" for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+ }
+ else if (strcmp(elem->defname, "check-xid-aborted") == 0)
+ {
+ if (elem->arg == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted needs an input value")));
+ else
+ {
+ long xid;
+
+ errno = 0;
+ xid = strtoul(strVal(elem->arg), NULL, 0);
+ if (xid == 0 || errno != 0)
+ data->check_xid_aborted = InvalidTransactionId;
+ else
+ data->check_xid_aborted = (TransactionId)xid;
+
+ if (!TransactionIdIsValid(data->check_xid_aborted))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted is not a valid xid: \"%s\"",
+ strVal(elem->arg))));
+ }
+ }
else
{
ereport(ERROR,
@@ -238,6 +299,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->twophase &= enable_2pc;
}
/* cleanup this plugin's resources */
@@ -297,6 +359,93 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ROLLBACK PREPARED callback */
+static void
+pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +604,25 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid_aborted is a valid xid, then it was passed in
+ * as an option to check if the transaction having this xid would be aborted.
+ * This is to test concurrent aborts.
+ */
+ if (TransactionIdIsValid(data->check_xid_aborted))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid_aborted);
+ while (TransactionIdIsInProgress(data->check_xid_aborted))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid_aborted) &&
+ !TransactionIdDidCommit(data->check_xid_aborted))
+ elog(LOG, "%u aborted", data->check_xid_aborted);
+
+ Assert(TransactionIdDidAbort(data->check_xid_aborted));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
@@ -646,6 +814,30 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}
static void
+pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "preparing streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "preparing streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..cfb1b32 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -387,11 +387,16 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
@@ -413,10 +418,18 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb);
An output plugin may also define functions to support streaming of large,
in-progress transactions. The <function>stream_start_cb</function>,
<function>stream_stop_cb</function>, <function>stream_abort_cb</function>,
- <function>stream_commit_cb</function> and <function>stream_change_cb</function>
+ <function>stream_commit_cb</function>, <function>stream_change_cb</function>,
+ and <function>stream_prepare_cb</function>
are required, while <function>stream_message_cb</function> and
<function>stream_truncate_cb</function> are optional.
</para>
+
+ <para>
+ An output plugin may also define functions to support two-phase commits, which are
+ decoded on <command>PREPARE TRANSACTION</command>. The <function>prepare_cb</function>,
+ <function>commit_prepared_cb</function> and <function>rollback_prepared_cb</function>
+ callbacks are required, while <function>filter_prepare_cb</function> is optional.
+ </para>
</sect2>
<sect2 id="logicaldecoding-capabilities">
@@ -477,7 +490,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +597,55 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The required <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Transaction Commit Prepared Callback</title>
+
+ <para>
+ The required <function>commit_prepared_cb</function> callback is called whenever
+ a transaction commit prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-rollback-prepared">
+ <title>Transaction Rollback Prepared Callback</title>
+
+ <para>
+ The required <function>rollback_prepared_cb</function> callback is called whenever
+ a transaction rollback prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +655,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +738,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +792,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
@@ -735,6 +848,19 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-stream-prepare">
+ <title>Stream Prepare Callback</title>
+ <para>
+ The <function>stream_prepare_cb</function> callback is called to prepare
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-stream-commit">
<title>Stream Commit Callback</title>
<para>
@@ -913,9 +1039,13 @@ OutputPluginWrite(ctx, true);
When streaming an in-progress transaction, the changes (and messages) are
streamed in blocks demarcated by <function>stream_start_cb</function>
and <function>stream_stop_cb</function> callbacks. Once all the decoded
- changes are transmitted, the transaction is committed using the
- <function>stream_commit_cb</function> callback (or possibly aborted using
- the <function>stream_abort_cb</function> callback).
+ changes are transmitted, the transaction can be committed using the
+ the <function>stream_commit_cb</function> callback
+ (or possibly aborted using the <function>stream_abort_cb</function> callback).
+ If two-phase commits are supported, the transaction can be prepared using the
+ <function>stream_prepare_cb</function> callback, commit prepared using the
+ <function>commit_prepared_cb</function> callback or aborted using the
+ <function>rollback_prepared_cb</function>.
</para>
<para>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 8675832..0f605ef 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -59,6 +59,14 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -74,6 +82,8 @@ static void stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr last_lsn);
static void stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
static void stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
static void stream_change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -207,6 +217,10 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->rollback_prepared = rollback_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -221,12 +235,26 @@ StartupDecodingContext(List *output_plugin_options,
ctx->streaming = (ctx->callbacks.stream_start_cb != NULL) ||
(ctx->callbacks.stream_stop_cb != NULL) ||
(ctx->callbacks.stream_abort_cb != NULL) ||
+ (ctx->callbacks.stream_prepare_cb != NULL) ||
(ctx->callbacks.stream_commit_cb != NULL) ||
(ctx->callbacks.stream_change_cb != NULL) ||
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);
/*
+ * To support two-phase logical decoding, we require prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however enable two-phase logical
+ * decoding when at least one of the methods is enabled so that we can easily identify
+ * missing methods.
+ *
+ * We decide it here, but only check it later in the wrappers.
+ */
+ ctx->twophase = (ctx->callbacks.prepare_cb != NULL) ||
+ (ctx->callbacks.commit_prepared_cb != NULL) ||
+ (ctx->callbacks.rollback_prepared_cb != NULL) ||
+ (ctx->callbacks.filter_prepare_cb != NULL);
+
+ /*
* streaming callbacks
*
* stream_message and stream_truncate callbacks are optional, so we do not
@@ -237,6 +265,7 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->stream_start = stream_start_cb_wrapper;
ctx->reorder->stream_stop = stream_stop_cb_wrapper;
ctx->reorder->stream_abort = stream_abort_cb_wrapper;
+ ctx->reorder->stream_prepare = stream_prepare_cb_wrapper;
ctx->reorder->stream_commit = stream_commit_cb_wrapper;
ctx->reorder->stream_change = stream_change_cb_wrapper;
ctx->reorder->stream_message = stream_message_cb_wrapper;
@@ -783,6 +812,120 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin supports two-phase commits then prepare callback is mandatory */
+ if (ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support two-phase commits then commit prepared callback is mandatory */
+ if (ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "rollback_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support two-phase commits then abort prepared callback is mandatory */
+ if (ctx->callbacks.rollback_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register rollback_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.rollback_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -859,6 +1002,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of two-phase transactions at PREPARE time is not enabled. In that
+ * case all two-phase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
@@ -1057,6 +1245,46 @@ stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming and two-phase commits are supported. */
+ Assert(ctx->streaming);
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_prepare";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_prepare_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming commits requires a stream_prepare_cb callback")));
+
+ ctx->callbacks.stream_prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 40bab7e..7f4384b 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -85,6 +85,11 @@ typedef struct LogicalDecodingContext
bool streaming;
/*
+ * Does the output plugin support two-phase decoding, and is it enabled?
+ */
+ bool twophase;
+
+ /*
* State for writing output.
*/
bool accept_writes;
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..4c1341f 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -77,6 +77,39 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/rollback_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -124,6 +157,14 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);
/*
+ * Called to prepare changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
* Called to apply changes streamed to remote node from in-progress
* transaction.
*/
@@ -171,12 +212,17 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1c77819..a323dfb 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -174,6 +175,9 @@ typedef struct ReorderBufferChange
#define RBTXN_IS_STREAMED 0x0010
#define RBTXN_HAS_TOAST_INSERT 0x0020
#define RBTXN_HAS_SPEC_INSERT 0x0040
+#define RBTXN_PREPARE 0x0080
+#define RBTXN_COMMIT_PREPARED 0x0100
+#define RBTXN_ROLLBACK_PREPARED 0x0200
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -233,6 +237,24 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* Has this transaction been prepared? */
+#define rbtxn_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_PREPARE) != 0 \
+)
+
+/* Has this prepared transaction been committed? */
+#define rbtxn_commit_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_COMMIT_PREPARED) != 0 \
+)
+
+/* Has this prepared transaction been rollbacked? */
+#define rbtxn_rollback_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_ROLLBACK_PREPARED) != 0 \
+)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -244,6 +266,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of two-phase commit we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -405,6 +430,31 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* rollback prepared callback signature */
+typedef void (*ReorderBufferRollbackPreparedCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -431,6 +481,12 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
/* commit streamed transaction callback signature */
typedef void (*ReorderBufferStreamCommitCB) (
ReorderBuffer *rb,
@@ -497,6 +553,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferRollbackPreparedCB rollback_prepared;
ReorderBufferMessageCB message;
/*
@@ -505,6 +566,7 @@ struct ReorderBuffer
ReorderBufferStreamStartCB stream_start;
ReorderBufferStreamStopCB stream_stop;
ReorderBufferStreamAbortCB stream_abort;
+ ReorderBufferStreamPrepareCB stream_prepare;
ReorderBufferStreamCommitCB stream_commit;
ReorderBufferStreamChangeCB stream_change;
ReorderBufferStreamMessageCB stream_message;
@@ -574,6 +636,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -597,6 +664,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
v11-0002-Support-2PC-txn-backend-and-tests.patchapplication/octet-stream; name=v11-0002-Support-2PC-txn-backend-and-tests.patchDownload
From 2e50cbdab0e8c06f76741444fd65f32f8875c3da Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 23 Oct 2020 04:44:11 -0400
Subject: [PATCH v11] Support 2PC txn backend and tests
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes backend changes to support decoding of PREPARE TRANSACTION,
COMMIT PREPARED and ROLLBACK PREPARED.
Includes two-phase commit test code (for test_decoding).
---
contrib/test_decoding/Makefile | 4 +-
contrib/test_decoding/expected/two_phase.out | 228 ++++++++++++++++++
contrib/test_decoding/sql/two_phase.sql | 119 ++++++++++
contrib/test_decoding/t/001_twophase.pl | 121 ++++++++++
src/backend/replication/logical/decode.c | 250 +++++++++++++++++++-
src/backend/replication/logical/reorderbuffer.c | 302 +++++++++++++++++++++---
6 files changed, 972 insertions(+), 52 deletions(-)
create mode 100644 contrib/test_decoding/expected/two_phase.out
create mode 100644 contrib/test_decoding/sql/two_phase.sql
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a4c76f..2abf3ce 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -4,11 +4,13 @@ MODULES = test_decoding
PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
- decoding_into_rel binary prepared replorigin time messages \
+ decoding_into_rel binary prepared replorigin time two_phase messages \
spill slot truncate stream stats
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/two_phase.out b/contrib/test_decoding/expected/two_phase.out
new file mode 100644
index 0000000..da6e6e6
--- /dev/null
+++ b/contrib/test_decoding/expected/two_phase.out
@@ -0,0 +1,228 @@
+-- Test two-phased transactions. When two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+-- Test 1:
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing because the xact has not been prepared yet.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ table public.test_prepared1: INSERT: id[integer]:2
+ PREPARE TRANSACTION 'test_prepared#1'
+(4 rows)
+
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
+-- Test 2:
+-- Test that rollback of a prepared xact is decoded.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(3 rows)
+
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
+-- Test 3:
+-- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
+-- The insert should show the newly altered column.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(3 rows)
+
+-- Test 4:
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:5
+ COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:6 data[text]:null
+ COMMIT
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
+ COMMIT
+(6 rows)
+
+-- Test 5:
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+----------------+----------+---------------------
+ test_prepared1 | relation | RowExclusiveLock
+ test_prepared1 | relation | ShareLock
+ test_prepared1 | relation | AccessExclusiveLock
+(3 rows)
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+(4 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- Test 6:
+-- Test savepoints and sub-xacts. Creating savepoints will create sub-xacts implicitly.
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------------------------------------------------------------
+ BEGIN
+ table public.test_prepared_savepoint: INSERT: a[integer]:1
+ PREPARE TRANSACTION 'test_prepared_savepoint'
+(3 rows)
+
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- Test 7:
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- Test 8:
+-- cleanup and make sure results are also empty
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+SELECT pg_drop_replication_slot('regression_slot');
+ pg_drop_replication_slot
+--------------------------
+
+(1 row)
+
diff --git a/contrib/test_decoding/sql/two_phase.sql b/contrib/test_decoding/sql/two_phase.sql
new file mode 100644
index 0000000..a7b23e6
--- /dev/null
+++ b/contrib/test_decoding/sql/two_phase.sql
@@ -0,0 +1,119 @@
+-- Test two-phased transactions. When two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+
+-- Test 1:
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing because the xact has not been prepared yet.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 2:
+-- Test that rollback of a prepared xact is decoded.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 3:
+-- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+-- The insert should show the newly altered column.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 4:
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 5:
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 6:
+-- Test savepoints and sub-xacts. Creating savepoints will create sub-xacts implicitly.
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 7:
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 8:
+-- cleanup and make sure results are also empty
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..1555582
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,121 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+#Test 1:
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid-aborted", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid-aborted" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid-aborted"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid-aborted', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Test 2:
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee9..fd961d4 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -68,8 +68,15 @@ static void DecodeSpecConfirm(LogicalDecodingContext *ctx, XLogRecordBuffer *buf
static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
+static void DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -239,7 +246,6 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
switch (info)
{
case XLOG_XACT_COMMIT:
- case XLOG_XACT_COMMIT_PREPARED:
{
xl_xact_commit *xlrec;
xl_xact_parsed_commit parsed;
@@ -256,8 +262,24 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeCommit(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ xl_xact_commit *xlrec;
+ xl_xact_parsed_commit parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_commit *) XLogRecGetData(r);
+ ParseCommitRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeCommitPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ABORT:
- case XLOG_XACT_ABORT_PREPARED:
{
xl_xact_abort *xlrec;
xl_xact_parsed_abort parsed;
@@ -274,6 +296,23 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeAbort(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ xl_xact_abort *xlrec;
+ xl_xact_parsed_abort parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_abort *) XLogRecGetData(r);
+ ParseAbortRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeAbortPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ASSIGNMENT:
/*
@@ -312,17 +351,35 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* check that output plugin is capable of two-phase decoding */
+ if (!ctx->twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
+
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -659,6 +716,131 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Consolidated commit record handling between the different form of commit
+ * records.
+ */
+static void
+DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid)
+{
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = parsed->xact_time;
+ RepOriginId origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
+ parsed->nsubxacts, parsed->subxacts);
+
+ /* ----
+ * Check whether we are interested in this specific transaction, and tell
+ * the reorderbuffer to forget the content of the (sub-)transactions
+ * if not.
+ *
+ * There can be several reasons we might not be interested in this
+ * transaction:
+ * 1) We might not be interested in decoding transactions up to this
+ * LSN. This can happen because we previously decoded it and now just
+ * are restarting or if we haven't assembled a consistent snapshot yet.
+ * 2) The transaction happened in another database.
+ * 3) The output plugin is not interested in the origin.
+ * 4) We are doing fast-forwarding
+ *
+ * We can't just use ReorderBufferAbort() here, because we need to execute
+ * the transaction's invalidations. This currently won't be needed if
+ * we're just skipping over the transaction because currently we only do
+ * so during startup, to get to the first transaction the client needs. As
+ * we have reset the catalog caches before starting to read WAL, and we
+ * haven't yet touched any catalogs, there can't be anything to invalidate.
+ * But if we're "forgetting" this commit because it's it happened in
+ * another database, the invalidations might be important, because they
+ * could be for shared catalogs and we might have loaded data into the
+ * relevant syscaches.
+ * ---
+ */
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ {
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
+ }
+ ReorderBufferForget(ctx->reorder, xid, buf->origptr);
+
+ return;
+ }
+
+ /* tell the reorderbuffer about the surviving subtransactions */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /*
+ * For COMMIT PREPARED, the changes have already been replayed at
+ * PREPARE time, so we only need to notify the subscriber that the GID
+ * finally committed.
+ * If filter check present and this needs to be skipped, do a regular commit.
+ */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+ else
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
+}
+
+/*
* Get the data from the various forms of abort records and pass it on to
* snapbuild.c and reorderbuffer.c
*/
@@ -681,6 +863,50 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Get the data from the various forms of abort records and pass it on to
+ * snapbuild.c and reorderbuffer.c
+ */
+static void
+DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid)
+{
+ int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it passes through the filters handle the ROLLBACK via callbacks
+ */
+ if(!FilterByOrigin(ctx, origin_id) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ !ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ Assert(TransactionIdIsValid(xid));
+ Assert(parsed->dbId == ctx->slot->data.database);
+
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
+
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferAbort(ctx->reorder, parsed->subxacts[i],
+ buf->record->EndRecPtr);
+ }
+
+ ReorderBufferAbort(ctx->reorder, xid, buf->record->EndRecPtr);
+}
+
+/*
* Parse XLOG_HEAP_INSERT (not MULTI_INSERT!) records into tuplebufs.
*
* Deletes can contain the new tuple.
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7a8bf76..0a5c158 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
static void ReorderBufferCleanupSerializedTXNs(const char *slotname);
static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
TransactionId xid, XLogSegNo segno);
@@ -418,6 +419,12 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
+
if (txn->tuplecid_hash != NULL)
{
hash_destroy(txn->tuplecid_hash);
@@ -1511,12 +1518,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either after streaming or
+ * after a PREPARE.
+ * The flag txn_prepared indicates if this is called after a PREPARE.
+ * If streaming, keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots. If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared)
{
dlist_mutable_iter iter;
@@ -1535,7 +1544,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
/* cleanup changes in the txn */
@@ -1569,9 +1578,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
+ {
+ /*
+ * If this is a prepared txn, cleanup the tuplecids we stored for decoding
+ * catalog snapshot access.
+ * They are always stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);
+ Assert(change->action == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* Remove the change from its containing list. */
+ dlist_delete(&change->node);
+
+ ReorderBufferReturnChange(rb, change, true);
+ }
+ }
+
/*
* Destroy the (relfilenode, ctid) hashtable, so that we don't leak any
* memory. We could also keep the hash table and update it with new ctid
@@ -1765,9 +1798,22 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn)
ReorderBufferStreamTXN(rb, txn);
- rb->stream_commit(rb, txn, txn->final_lsn);
-
- ReorderBufferCleanupTXN(rb, txn);
+ if (rbtxn_prepared(txn))
+ {
+ rb->stream_prepare(rb, txn, txn->final_lsn);
+ /* Here we are streaming and part of the PREPARE of a two-phase commit
+ * The full cleanup will happen as part of the COMMIT PREPAREDs, so now
+ * just truncate txn by removing changes and tuple_cids
+ */
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
+ else
+ {
+ rb->stream_commit(rb, txn, txn->final_lsn);
+ ReorderBufferCleanupTXN(rb, txn);
+ }
}
/*
@@ -1895,8 +1941,11 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
XLogRecPtr last_lsn,
ReorderBufferChange *specinsert)
{
- /* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ /* Discard the changes that we just streamed.
+ * This can only be called if streaming and not part of a PREPARE in
+ * a two-phase commit, so set prepared flag as false.
+ */
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Free all resources allocated for toast reconstruction */
ReorderBufferToastReset(rb, txn);
@@ -1918,6 +1967,11 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* Helper function for ReorderBufferCommit and ReorderBufferStreamTXN.
*
+ * We are here due to one of the 3 scenarios:
+ * 1. As part of streaming an in-progress transactions
+ * 2. Prepare of a two-phase commit
+ * 3. Commit of a transaction.
+ *
* Send data of a transaction (and its subtransactions) to the
* output plugin. We iterate over the top and subtransactions (using a k-way
* merge) and replay the changes in lsn order.
@@ -2003,7 +2057,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2294,7 +2348,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for two-phase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2328,18 +2391,32 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
RollbackAndReleaseCurrentSubTransaction();
/*
+ * We are here due to one of the 3 scenarios:
+ * 1. As part of streaming in-progress transactions
+ * 2. Prepare of a two-phase commit
+ * 3. Commit of a transaction.
+ *
* If we are streaming the in-progress transaction then discard the
* changes that we just streamed, and mark the transactions as
- * streamed (if they contained changes). Otherwise, remove all the
+ * streamed (if they contained changes), set prepared flag as false.
+ * If part of a prepare of a two-phase commit set the prepared flag
+ * as true so that we can discard changes and cleanup tuplecids.
+ * Otherwise, remove all the
* changes and deallocate the ReorderBufferTXN.
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
+ {
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
else
ReorderBufferCleanupTXN(rb, txn);
}
@@ -2369,17 +2446,20 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
- * This error can only occur when we are sending the data in
- * streaming mode and the streaming is not finished yet.
+ * This error can only occur either when we are sending the data in
+ * streaming mode and the streaming is not finished yet or when we are
+ * sending the data out on a PREPARE during a two-phase commit.
+ * Both conditions can't be true either, it should be one of them.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
+ Assert(!(streaming && rbtxn_prepared(txn)));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2387,10 +2467,21 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /*
+ * If streaming, reset the TXN so that it is allowed to stream remaining data.
+ */
+ if (streaming)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
}
else
{
@@ -2412,23 +2503,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2470,6 +2554,141 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a two-phase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ROLLBACK PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyway, two-phase transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ else
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+
+ if (rbtxn_commit_prepared(txn))
+ rb->commit_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_rollback_prepared(txn))
+ rb->rollback_prepared(rb, txn, commit_lsn);
+
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(txn->ninvalidations,
+ txn->invalidations);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2512,7 +2731,12 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
ReorderBufferCleanupTXN(rb, txn);
}
--
1.8.3.1
On Fri, Oct 23, 2020 at 3:41 PM Ajin Cherian <itsajin@gmail.com> wrote:
Amit,
I have also modified the stream callback APIs to not include
stream_commit_prpeared and stream_rollback_prepared, instead use the
non-stream APIs for the same functionality.
I have also updated the test_decoding and pgoutput plugins accordingly.
Thanks, I think you forgot to address one of my comments in the
previous email[1]/messages/by-id/CAA4eK1JzRvUX2XLEKo2f74Vjecnt6wq-kkk1OiyMJ5XjJN+GvQ@mail.gmail.com (See "One minor comment .."). You have not even
responded to it.
[1]: /messages/by-id/CAA4eK1JzRvUX2XLEKo2f74Vjecnt6wq-kkk1OiyMJ5XjJN+GvQ@mail.gmail.com
--
With Regards,
Amit Kapila.
Hi Ajin.
I've addressed your review comments (details below) and bumped the
patch set to v12 attached.
I also added more test cases.
On Tue, Oct 20, 2020 at 10:02 PM Ajin Cherian <itsajin@gmail.com> wrote:
Thanks for your patch. Some comments for your patch:
Comments:
src/backend/replication/logical/worker.c @@ -888,6 +888,319 @@ apply_handle_prepare(StringInfo s) + /* + * FIXME - Following condition was in apply_handle_prepare_txn except I found it was ALWAYS IsTransactionState() == false + * The synchronization worker runs in single transaction. * + if (IsTransactionState() && !am_tablesync_worker()) + */ + if (!am_tablesync_worker())Comment: I dont think a tablesync worker will use streaming, none of
the other stream APIs check this, this might not be relevant for
stream_prepare either.
Updated
+ /* + * ================================================================================================== + * The following chunk of code is largely cut/paste from the existing apply_handle_prepare_commit_txnComment: Here, I think you meant apply_handle_stream_commit.
Updated.
Also
rather than duplicating this chunk of code, you could put it in a new
function.
Code is refactored to share a common function for the spool file processing.
+ else + { + /* Process any invalidation messages that might have accumulated. */ + AcceptInvalidationMessages(); + maybe_reread_subscription(); + }Comment: This else block might not be necessary as a tablesync worker
will not initiate the streaming APIs.
Updated
~
Kind Regards,
Peter Smith
Fujitsu Australia
Attachments:
v12-0001-Support-2PC-txn-base.patchapplication/octet-stream; name=v12-0001-Support-2PC-txn-base.patchDownload
From 5fa5544624609d841558ef37e4dc4b2995519330 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 26 Oct 2020 11:20:49 +1100
Subject: [PATCH v12] Support-2PC-txn-base.
Until now two-phase transaction commands were translated into regular transactions
on the subscriber, and the GID was not forwarded to it. None of the two-phase semantics
were communicated to the subscriber.
This patch provides infrastructure for logical decoding plugins to be informed of
two-phase commands Like PREPARE TRANSACTION, COMMIT PREPARED
and ROLLBACK PREPARED commands with the corresponding GID.
Include logical decoding plugin API infrastructure changes.
Includes contrib/test_decoding changes.
Includes documentation changes.
---
contrib/test_decoding/test_decoding.c | 192 +++++++++++++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 144 ++++++++++++++++++-
src/backend/replication/logical/logical.c | 228 ++++++++++++++++++++++++++++++
src/include/replication/logical.h | 5 +
src/include/replication/output_plugin.h | 46 ++++++
src/include/replication/reorderbuffer.h | 76 ++++++++++
6 files changed, 684 insertions(+), 7 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 8e33614..7eb13ce 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid_aborted; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -73,6 +78,9 @@ static void pg_decode_stream_stop(LogicalDecodingContext *ctx,
static void pg_decode_stream_abort(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
static void pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
@@ -88,6 +96,18 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
void
_PG_init(void)
@@ -112,10 +132,15 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_start_cb = pg_decode_stream_start;
cb->stream_stop_cb = pg_decode_stream_stop;
cb->stream_abort_cb = pg_decode_stream_abort;
+ cb->stream_prepare_cb = pg_decode_stream_prepare;
cb->stream_commit_cb = pg_decode_stream_commit;
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->rollback_prepared_cb = pg_decode_rollback_prepared_txn;
}
@@ -127,6 +152,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
ListCell *option;
TestDecodingData *data;
bool enable_streaming = false;
+ bool enable_2pc = false;
data = palloc0(sizeof(TestDecodingData));
data->context = AllocSetContextCreate(ctx->context,
@@ -136,6 +162,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid_aborted = InvalidTransactionId;
ctx->output_plugin_private = data;
@@ -227,6 +254,40 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
+ else if (!parse_bool(strVal(elem->arg), &enable_2pc))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse value \"%s\" for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+ }
+ else if (strcmp(elem->defname, "check-xid-aborted") == 0)
+ {
+ if (elem->arg == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted needs an input value")));
+ else
+ {
+ long xid;
+
+ errno = 0;
+ xid = strtoul(strVal(elem->arg), NULL, 0);
+ if (xid == 0 || errno != 0)
+ data->check_xid_aborted = InvalidTransactionId;
+ else
+ data->check_xid_aborted = (TransactionId)xid;
+
+ if (!TransactionIdIsValid(data->check_xid_aborted))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted is not a valid xid: \"%s\"",
+ strVal(elem->arg))));
+ }
+ }
else
{
ereport(ERROR,
@@ -238,6 +299,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->twophase &= enable_2pc;
}
/* cleanup this plugin's resources */
@@ -297,6 +359,93 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ROLLBACK PREPARED callback */
+static void
+pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +604,25 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid_aborted is a valid xid, then it was passed in
+ * as an option to check if the transaction having this xid would be aborted.
+ * This is to test concurrent aborts.
+ */
+ if (TransactionIdIsValid(data->check_xid_aborted))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid_aborted);
+ while (TransactionIdIsInProgress(data->check_xid_aborted))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid_aborted) &&
+ !TransactionIdDidCommit(data->check_xid_aborted))
+ elog(LOG, "%u aborted", data->check_xid_aborted);
+
+ Assert(TransactionIdDidAbort(data->check_xid_aborted));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
@@ -646,6 +814,30 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}
static void
+pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "preparing streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "preparing streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..cfb1b32 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -387,11 +387,16 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
@@ -413,10 +418,18 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb);
An output plugin may also define functions to support streaming of large,
in-progress transactions. The <function>stream_start_cb</function>,
<function>stream_stop_cb</function>, <function>stream_abort_cb</function>,
- <function>stream_commit_cb</function> and <function>stream_change_cb</function>
+ <function>stream_commit_cb</function>, <function>stream_change_cb</function>,
+ and <function>stream_prepare_cb</function>
are required, while <function>stream_message_cb</function> and
<function>stream_truncate_cb</function> are optional.
</para>
+
+ <para>
+ An output plugin may also define functions to support two-phase commits, which are
+ decoded on <command>PREPARE TRANSACTION</command>. The <function>prepare_cb</function>,
+ <function>commit_prepared_cb</function> and <function>rollback_prepared_cb</function>
+ callbacks are required, while <function>filter_prepare_cb</function> is optional.
+ </para>
</sect2>
<sect2 id="logicaldecoding-capabilities">
@@ -477,7 +490,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +597,55 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The required <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Transaction Commit Prepared Callback</title>
+
+ <para>
+ The required <function>commit_prepared_cb</function> callback is called whenever
+ a transaction commit prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-rollback-prepared">
+ <title>Transaction Rollback Prepared Callback</title>
+
+ <para>
+ The required <function>rollback_prepared_cb</function> callback is called whenever
+ a transaction rollback prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +655,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +738,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +792,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
@@ -735,6 +848,19 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-stream-prepare">
+ <title>Stream Prepare Callback</title>
+ <para>
+ The <function>stream_prepare_cb</function> callback is called to prepare
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-stream-commit">
<title>Stream Commit Callback</title>
<para>
@@ -913,9 +1039,13 @@ OutputPluginWrite(ctx, true);
When streaming an in-progress transaction, the changes (and messages) are
streamed in blocks demarcated by <function>stream_start_cb</function>
and <function>stream_stop_cb</function> callbacks. Once all the decoded
- changes are transmitted, the transaction is committed using the
- <function>stream_commit_cb</function> callback (or possibly aborted using
- the <function>stream_abort_cb</function> callback).
+ changes are transmitted, the transaction can be committed using the
+ the <function>stream_commit_cb</function> callback
+ (or possibly aborted using the <function>stream_abort_cb</function> callback).
+ If two-phase commits are supported, the transaction can be prepared using the
+ <function>stream_prepare_cb</function> callback, commit prepared using the
+ <function>commit_prepared_cb</function> callback or aborted using the
+ <function>rollback_prepared_cb</function>.
</para>
<para>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 8675832..0f605ef 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -59,6 +59,14 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -74,6 +82,8 @@ static void stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr last_lsn);
static void stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
static void stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
static void stream_change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -207,6 +217,10 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->rollback_prepared = rollback_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -221,12 +235,26 @@ StartupDecodingContext(List *output_plugin_options,
ctx->streaming = (ctx->callbacks.stream_start_cb != NULL) ||
(ctx->callbacks.stream_stop_cb != NULL) ||
(ctx->callbacks.stream_abort_cb != NULL) ||
+ (ctx->callbacks.stream_prepare_cb != NULL) ||
(ctx->callbacks.stream_commit_cb != NULL) ||
(ctx->callbacks.stream_change_cb != NULL) ||
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);
/*
+ * To support two-phase logical decoding, we require prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however enable two-phase logical
+ * decoding when at least one of the methods is enabled so that we can easily identify
+ * missing methods.
+ *
+ * We decide it here, but only check it later in the wrappers.
+ */
+ ctx->twophase = (ctx->callbacks.prepare_cb != NULL) ||
+ (ctx->callbacks.commit_prepared_cb != NULL) ||
+ (ctx->callbacks.rollback_prepared_cb != NULL) ||
+ (ctx->callbacks.filter_prepare_cb != NULL);
+
+ /*
* streaming callbacks
*
* stream_message and stream_truncate callbacks are optional, so we do not
@@ -237,6 +265,7 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->stream_start = stream_start_cb_wrapper;
ctx->reorder->stream_stop = stream_stop_cb_wrapper;
ctx->reorder->stream_abort = stream_abort_cb_wrapper;
+ ctx->reorder->stream_prepare = stream_prepare_cb_wrapper;
ctx->reorder->stream_commit = stream_commit_cb_wrapper;
ctx->reorder->stream_change = stream_change_cb_wrapper;
ctx->reorder->stream_message = stream_message_cb_wrapper;
@@ -783,6 +812,120 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin supports two-phase commits then prepare callback is mandatory */
+ if (ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support two-phase commits then commit prepared callback is mandatory */
+ if (ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "rollback_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support two-phase commits then abort prepared callback is mandatory */
+ if (ctx->callbacks.rollback_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register rollback_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.rollback_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -859,6 +1002,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of two-phase transactions at PREPARE time is not enabled. In that
+ * case all two-phase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
@@ -1057,6 +1245,46 @@ stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming and two-phase commits are supported. */
+ Assert(ctx->streaming);
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_prepare";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_prepare_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming commits requires a stream_prepare_cb callback")));
+
+ ctx->callbacks.stream_prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 40bab7e..7f4384b 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -85,6 +85,11 @@ typedef struct LogicalDecodingContext
bool streaming;
/*
+ * Does the output plugin support two-phase decoding, and is it enabled?
+ */
+ bool twophase;
+
+ /*
* State for writing output.
*/
bool accept_writes;
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..4c1341f 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -77,6 +77,39 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/rollback_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -124,6 +157,14 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);
/*
+ * Called to prepare changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
* Called to apply changes streamed to remote node from in-progress
* transaction.
*/
@@ -171,12 +212,17 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1c77819..a323dfb 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -174,6 +175,9 @@ typedef struct ReorderBufferChange
#define RBTXN_IS_STREAMED 0x0010
#define RBTXN_HAS_TOAST_INSERT 0x0020
#define RBTXN_HAS_SPEC_INSERT 0x0040
+#define RBTXN_PREPARE 0x0080
+#define RBTXN_COMMIT_PREPARED 0x0100
+#define RBTXN_ROLLBACK_PREPARED 0x0200
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -233,6 +237,24 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* Has this transaction been prepared? */
+#define rbtxn_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_PREPARE) != 0 \
+)
+
+/* Has this prepared transaction been committed? */
+#define rbtxn_commit_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_COMMIT_PREPARED) != 0 \
+)
+
+/* Has this prepared transaction been rollbacked? */
+#define rbtxn_rollback_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_ROLLBACK_PREPARED) != 0 \
+)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -244,6 +266,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of two-phase commit we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -405,6 +430,31 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+/* abort callback signature */
+typedef void (*ReorderBufferAbortCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
+typedef bool (*ReorderBufferFilterPrepareCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* rollback prepared callback signature */
+typedef void (*ReorderBufferRollbackPreparedCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -431,6 +481,12 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
/* commit streamed transaction callback signature */
typedef void (*ReorderBufferStreamCommitCB) (
ReorderBuffer *rb,
@@ -497,6 +553,11 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferAbortCB abort;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferRollbackPreparedCB rollback_prepared;
ReorderBufferMessageCB message;
/*
@@ -505,6 +566,7 @@ struct ReorderBuffer
ReorderBufferStreamStartCB stream_start;
ReorderBufferStreamStopCB stream_stop;
ReorderBufferStreamAbortCB stream_abort;
+ ReorderBufferStreamPrepareCB stream_prepare;
ReorderBufferStreamCommitCB stream_commit;
ReorderBufferStreamChangeCB stream_change;
ReorderBufferStreamMessageCB stream_message;
@@ -574,6 +636,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -597,6 +664,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
v12-0002-Support-2PC-txn-backend-and-tests.patchapplication/octet-stream; name=v12-0002-Support-2PC-txn-backend-and-tests.patchDownload
From 585e672158366f5a3b2a7d9330f5e0a0404d6f54 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 26 Oct 2020 11:25:57 +1100
Subject: [PATCH v12] Support 2PC txn backend and tests.
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes backend changes to support decoding of PREPARE TRANSACTION,
COMMIT PREPARED and ROLLBACK PREPARED.
Includes two-phase commit test code (for test_decoding).
---
contrib/test_decoding/Makefile | 4 +-
contrib/test_decoding/expected/two_phase.out | 228 ++++++++++++++++++
contrib/test_decoding/sql/two_phase.sql | 119 ++++++++++
contrib/test_decoding/t/001_twophase.pl | 121 ++++++++++
src/backend/replication/logical/decode.c | 250 +++++++++++++++++++-
src/backend/replication/logical/reorderbuffer.c | 302 +++++++++++++++++++++---
6 files changed, 972 insertions(+), 52 deletions(-)
create mode 100644 contrib/test_decoding/expected/two_phase.out
create mode 100644 contrib/test_decoding/sql/two_phase.sql
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a4c76f..2abf3ce 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -4,11 +4,13 @@ MODULES = test_decoding
PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
- decoding_into_rel binary prepared replorigin time messages \
+ decoding_into_rel binary prepared replorigin time two_phase messages \
spill slot truncate stream stats
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/two_phase.out b/contrib/test_decoding/expected/two_phase.out
new file mode 100644
index 0000000..da6e6e6
--- /dev/null
+++ b/contrib/test_decoding/expected/two_phase.out
@@ -0,0 +1,228 @@
+-- Test two-phased transactions. When two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+-- Test 1:
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing because the xact has not been prepared yet.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ table public.test_prepared1: INSERT: id[integer]:2
+ PREPARE TRANSACTION 'test_prepared#1'
+(4 rows)
+
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
+-- Test 2:
+-- Test that rollback of a prepared xact is decoded.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(3 rows)
+
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
+-- Test 3:
+-- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
+-- The insert should show the newly altered column.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(3 rows)
+
+-- Test 4:
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:5
+ COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:6 data[text]:null
+ COMMIT
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
+ COMMIT
+(6 rows)
+
+-- Test 5:
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+----------------+----------+---------------------
+ test_prepared1 | relation | RowExclusiveLock
+ test_prepared1 | relation | ShareLock
+ test_prepared1 | relation | AccessExclusiveLock
+(3 rows)
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+(4 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- Test 6:
+-- Test savepoints and sub-xacts. Creating savepoints will create sub-xacts implicitly.
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------------------------------------------------------------
+ BEGIN
+ table public.test_prepared_savepoint: INSERT: a[integer]:1
+ PREPARE TRANSACTION 'test_prepared_savepoint'
+(3 rows)
+
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- Test 7:
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- Test 8:
+-- cleanup and make sure results are also empty
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+SELECT pg_drop_replication_slot('regression_slot');
+ pg_drop_replication_slot
+--------------------------
+
+(1 row)
+
diff --git a/contrib/test_decoding/sql/two_phase.sql b/contrib/test_decoding/sql/two_phase.sql
new file mode 100644
index 0000000..a7b23e6
--- /dev/null
+++ b/contrib/test_decoding/sql/two_phase.sql
@@ -0,0 +1,119 @@
+-- Test two-phased transactions. When two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+
+-- Test 1:
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing because the xact has not been prepared yet.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+PREPARE TRANSACTION 'test_prepared#1';
+--
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 2:
+-- Test that rollback of a prepared xact is decoded.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 3:
+-- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+-- The insert should show the newly altered column.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 4:
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 5:
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 6:
+-- Test savepoints and sub-xacts. Creating savepoints will create sub-xacts implicitly.
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 7:
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 8:
+-- cleanup and make sure results are also empty
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..1555582
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,121 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+#Test 1:
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid-aborted", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid-aborted" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid-aborted"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid-aborted', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Test 2:
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee9..fd961d4 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -68,8 +68,15 @@ static void DecodeSpecConfirm(LogicalDecodingContext *ctx, XLogRecordBuffer *buf
static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
+static void DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -239,7 +246,6 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
switch (info)
{
case XLOG_XACT_COMMIT:
- case XLOG_XACT_COMMIT_PREPARED:
{
xl_xact_commit *xlrec;
xl_xact_parsed_commit parsed;
@@ -256,8 +262,24 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeCommit(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ xl_xact_commit *xlrec;
+ xl_xact_parsed_commit parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_commit *) XLogRecGetData(r);
+ ParseCommitRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeCommitPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ABORT:
- case XLOG_XACT_ABORT_PREPARED:
{
xl_xact_abort *xlrec;
xl_xact_parsed_abort parsed;
@@ -274,6 +296,23 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeAbort(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ xl_xact_abort *xlrec;
+ xl_xact_parsed_abort parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_abort *) XLogRecGetData(r);
+ ParseAbortRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeAbortPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ASSIGNMENT:
/*
@@ -312,17 +351,35 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* check that output plugin is capable of two-phase decoding */
+ if (!ctx->twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
+
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -659,6 +716,131 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Consolidated commit record handling between the different form of commit
+ * records.
+ */
+static void
+DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid)
+{
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = parsed->xact_time;
+ RepOriginId origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
+ parsed->nsubxacts, parsed->subxacts);
+
+ /* ----
+ * Check whether we are interested in this specific transaction, and tell
+ * the reorderbuffer to forget the content of the (sub-)transactions
+ * if not.
+ *
+ * There can be several reasons we might not be interested in this
+ * transaction:
+ * 1) We might not be interested in decoding transactions up to this
+ * LSN. This can happen because we previously decoded it and now just
+ * are restarting or if we haven't assembled a consistent snapshot yet.
+ * 2) The transaction happened in another database.
+ * 3) The output plugin is not interested in the origin.
+ * 4) We are doing fast-forwarding
+ *
+ * We can't just use ReorderBufferAbort() here, because we need to execute
+ * the transaction's invalidations. This currently won't be needed if
+ * we're just skipping over the transaction because currently we only do
+ * so during startup, to get to the first transaction the client needs. As
+ * we have reset the catalog caches before starting to read WAL, and we
+ * haven't yet touched any catalogs, there can't be anything to invalidate.
+ * But if we're "forgetting" this commit because it's it happened in
+ * another database, the invalidations might be important, because they
+ * could be for shared catalogs and we might have loaded data into the
+ * relevant syscaches.
+ * ---
+ */
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ {
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
+ }
+ ReorderBufferForget(ctx->reorder, xid, buf->origptr);
+
+ return;
+ }
+
+ /* tell the reorderbuffer about the surviving subtransactions */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /*
+ * For COMMIT PREPARED, the changes have already been replayed at
+ * PREPARE time, so we only need to notify the subscriber that the GID
+ * finally committed.
+ * If filter check present and this needs to be skipped, do a regular commit.
+ */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+ else
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
+}
+
+/*
* Get the data from the various forms of abort records and pass it on to
* snapbuild.c and reorderbuffer.c
*/
@@ -681,6 +863,50 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Get the data from the various forms of abort records and pass it on to
+ * snapbuild.c and reorderbuffer.c
+ */
+static void
+DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid)
+{
+ int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it passes through the filters handle the ROLLBACK via callbacks
+ */
+ if(!FilterByOrigin(ctx, origin_id) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ !ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ Assert(TransactionIdIsValid(xid));
+ Assert(parsed->dbId == ctx->slot->data.database);
+
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
+
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferAbort(ctx->reorder, parsed->subxacts[i],
+ buf->record->EndRecPtr);
+ }
+
+ ReorderBufferAbort(ctx->reorder, xid, buf->record->EndRecPtr);
+}
+
+/*
* Parse XLOG_HEAP_INSERT (not MULTI_INSERT!) records into tuplebufs.
*
* Deletes can contain the new tuple.
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7a8bf76..0a5c158 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
static void ReorderBufferCleanupSerializedTXNs(const char *slotname);
static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
TransactionId xid, XLogSegNo segno);
@@ -418,6 +419,12 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
+
if (txn->tuplecid_hash != NULL)
{
hash_destroy(txn->tuplecid_hash);
@@ -1511,12 +1518,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either after streaming or
+ * after a PREPARE.
+ * The flag txn_prepared indicates if this is called after a PREPARE.
+ * If streaming, keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots. If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared)
{
dlist_mutable_iter iter;
@@ -1535,7 +1544,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
/* cleanup changes in the txn */
@@ -1569,9 +1578,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
+ {
+ /*
+ * If this is a prepared txn, cleanup the tuplecids we stored for decoding
+ * catalog snapshot access.
+ * They are always stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);
+ Assert(change->action == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* Remove the change from its containing list. */
+ dlist_delete(&change->node);
+
+ ReorderBufferReturnChange(rb, change, true);
+ }
+ }
+
/*
* Destroy the (relfilenode, ctid) hashtable, so that we don't leak any
* memory. We could also keep the hash table and update it with new ctid
@@ -1765,9 +1798,22 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn)
ReorderBufferStreamTXN(rb, txn);
- rb->stream_commit(rb, txn, txn->final_lsn);
-
- ReorderBufferCleanupTXN(rb, txn);
+ if (rbtxn_prepared(txn))
+ {
+ rb->stream_prepare(rb, txn, txn->final_lsn);
+ /* Here we are streaming and part of the PREPARE of a two-phase commit
+ * The full cleanup will happen as part of the COMMIT PREPAREDs, so now
+ * just truncate txn by removing changes and tuple_cids
+ */
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
+ else
+ {
+ rb->stream_commit(rb, txn, txn->final_lsn);
+ ReorderBufferCleanupTXN(rb, txn);
+ }
}
/*
@@ -1895,8 +1941,11 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
XLogRecPtr last_lsn,
ReorderBufferChange *specinsert)
{
- /* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ /* Discard the changes that we just streamed.
+ * This can only be called if streaming and not part of a PREPARE in
+ * a two-phase commit, so set prepared flag as false.
+ */
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Free all resources allocated for toast reconstruction */
ReorderBufferToastReset(rb, txn);
@@ -1918,6 +1967,11 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* Helper function for ReorderBufferCommit and ReorderBufferStreamTXN.
*
+ * We are here due to one of the 3 scenarios:
+ * 1. As part of streaming an in-progress transactions
+ * 2. Prepare of a two-phase commit
+ * 3. Commit of a transaction.
+ *
* Send data of a transaction (and its subtransactions) to the
* output plugin. We iterate over the top and subtransactions (using a k-way
* merge) and replay the changes in lsn order.
@@ -2003,7 +2057,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2294,7 +2348,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for two-phase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2328,18 +2391,32 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
RollbackAndReleaseCurrentSubTransaction();
/*
+ * We are here due to one of the 3 scenarios:
+ * 1. As part of streaming in-progress transactions
+ * 2. Prepare of a two-phase commit
+ * 3. Commit of a transaction.
+ *
* If we are streaming the in-progress transaction then discard the
* changes that we just streamed, and mark the transactions as
- * streamed (if they contained changes). Otherwise, remove all the
+ * streamed (if they contained changes), set prepared flag as false.
+ * If part of a prepare of a two-phase commit set the prepared flag
+ * as true so that we can discard changes and cleanup tuplecids.
+ * Otherwise, remove all the
* changes and deallocate the ReorderBufferTXN.
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
+ {
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
else
ReorderBufferCleanupTXN(rb, txn);
}
@@ -2369,17 +2446,20 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
- * This error can only occur when we are sending the data in
- * streaming mode and the streaming is not finished yet.
+ * This error can only occur either when we are sending the data in
+ * streaming mode and the streaming is not finished yet or when we are
+ * sending the data out on a PREPARE during a two-phase commit.
+ * Both conditions can't be true either, it should be one of them.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
+ Assert(!(streaming && rbtxn_prepared(txn)));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2387,10 +2467,21 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /*
+ * If streaming, reset the TXN so that it is allowed to stream remaining data.
+ */
+ if (streaming)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
}
else
{
@@ -2412,23 +2503,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2470,6 +2554,141 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a two-phase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ROLLBACK PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyway, two-phase transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ else
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+
+ if (rbtxn_commit_prepared(txn))
+ rb->commit_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_rollback_prepared(txn))
+ rb->rollback_prepared(rb, txn, commit_lsn);
+
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(txn->ninvalidations,
+ txn->invalidations);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2512,7 +2731,12 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
ReorderBufferCleanupTXN(rb, txn);
}
--
1.8.3.1
v12-0003-Support-2PC-txn-pgoutput.patchapplication/octet-stream; name=v12-0003-Support-2PC-txn-pgoutput.patchDownload
From 73142a5bb0ead534cddbdbdb9fb4d4b2bdf88b28 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.b.smith@fujitsu.com>
Date: Mon, 26 Oct 2020 12:59:59 +1100
Subject: [PATCH v12] Support 2PC txn - pgoutput.
This patch adds support in the pgoutput plugin and subscriber for handling
of two-phase commits.
Includes pgoutput changes.
Includes subscriber changes.
Includes two-phase commit test code (streaming and not streaming).
---
src/backend/access/transam/twophase.c | 27 ++
src/backend/replication/logical/proto.c | 148 +++++-
src/backend/replication/logical/worker.c | 286 +++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 75 +++-
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 33 ++
src/test/subscription/t/020_twophase.pl | 345 ++++++++++++++
src/test/subscription/t/021_twophase_streaming.pl | 521 ++++++++++++++++++++++
8 files changed, 1414 insertions(+), 22 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
create mode 100644 src/test/subscription/t/021_twophase_streaming.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060..2e0a408 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+ bool found = false;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (gxact->valid && strcmp(gxact->gid, gid) == 0)
+ {
+ found = true;
+ break;
+ }
+
+ }
+ LWLockRelease(TwoPhaseStateLock);
+ return found;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..ecd734b 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -78,7 +78,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, 'C'); /* sending COMMIT */
- /* send the flags field (unused for now) */
+ /* send the flags field */
pq_sendbyte(out, flags);
/* send fields */
@@ -99,6 +99,7 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
if (flags != 0)
elog(ERROR, "unrecognized flags %u in commit message", flags);
+
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
commit_data->end_lsn = pq_getmsgint64(in);
@@ -106,6 +107,151 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for two-phase commit transactions. In which case we
+ * expect to have a valid GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(txn->gid != NULL);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags = LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags = LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags = LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
+ * Write STREAM PREPARE to the output stream.
+ * (For stream PREPARE, stream COMMIT PREPARED, stream ROLLBACK PREPARED)
+ */
+void
+logicalrep_write_stream_prepare(StringInfo out,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'p'); /* sending STREAM PREPARE protocol */
+
+ /*
+ * This should only ever happen for two-phase transactions. In which case we
+ * expect to have a valid GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(txn->gid != NULL);
+
+ /*
+ * For streaming APIs only PREPARE is supported. [COMMIT|ROLLBACK] PREPARED
+ * uses non-streaming APIs
+ */
+ flags = LOGICALREP_IS_PREPARE;
+
+ /* transaction ID */
+ Assert(TransactionIdIsValid(txn->xid));
+ pq_sendint32(out, txn->xid);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read STREAM PREPARE from the output stream.
+ * (For stream PREPARE, stream COMMIT PREPARED, stream ROLLBACK PREPARED)
+ */
+TransactionId
+logicalrep_read_stream_prepare(StringInfo in, LogicalRepPrepareData *prepare_data)
+{
+ TransactionId xid;
+ uint8 flags;
+
+ xid = pq_getmsgint(in, 4);
+
+ /* read flags */
+ flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+
+ return xid;
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 3a5b733..2f6514e 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -246,6 +246,8 @@ static void apply_handle_tuple_routing(ResultRelInfo *relinfo,
LogicalRepRelMapEntry *relmapentry,
CmdType operation);
+static int apply_spooled_messages(TransactionId xid, XLogRecPtr lsn);
+
/*
* Should this worker apply changes for given relation.
*
@@ -722,6 +724,7 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_timestamp = commit_data.committime;
CommitTransactionCommand();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -742,6 +745,225 @@ apply_handle_commit(StringInfo s)
}
/*
+ * Called from apply_handle_prepare to handle a PREPARE TRANSACTION.
+ */
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a COMMIT PREPARED of a previously
+ * PREPARED transaction.
+ */
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a ROLLBACK PREPARED of a previously
+ * PREPARED TRANSACTION.
+ */
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
+/*
+ * Handle STREAM PREPARE.
+ *
+ * Logic is in two parts:
+ * 1. Replay all the spooled operations
+ * 2. Mark the transaction as prepared
+ */
+static void
+apply_handle_stream_prepare(StringInfo s)
+{
+ int nchanges = 0;
+ LogicalRepPrepareData prepare_data;
+ TransactionId xid;
+
+ Assert(!in_streamed_transaction);
+
+ xid = logicalrep_read_stream_prepare(s, &prepare_data);
+ elog(DEBUG1, "received prepare for streamed transaction %u", xid);
+
+ /*
+ * This should be a PREPARE only. The COMMIT PREPARED and ROLLBACK PREPARED
+ * for streaming are handled by the non-streaming APIs.
+ */
+ Assert(prepare_data.prepare_type == LOGICALREP_IS_PREPARE);
+
+ /*
+ * ========================================
+ * 1. Replay all the spooled operations
+ * - This code is same as what apply_handle_stream_commit does for NON two-phase stream commit
+ * ========================================
+ */
+
+ ensure_transaction();
+
+ nchanges = apply_spooled_messages(xid, prepare_data.prepare_lsn);
+
+ /*
+ * ========================================
+ * 2. Mark the transaction as prepared.
+ * - This code is same as what apply_handle_prepare_txn does for two-phase prepare of the non-streamed tx
+ * ========================================
+ */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data.end_lsn;
+ replorigin_session_origin_timestamp = prepare_data.preparetime;
+
+ PrepareTransactionBlock(prepare_data.gid);
+ CommitTransactionCommand();
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data.end_lsn);
+
+ elog(DEBUG1, "apply_handle_stream_prepare_txn: replayed %d (all) changes.", nchanges);
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data.end_lsn);
+
+ /* unlink the files with serialized changes and subxact info */
+ stream_cleanup_files(MyLogicalRepWorker->subid, xid);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
* Handle ORIGIN message.
*
* TODO, support tracking of multiple origins
@@ -935,30 +1157,21 @@ apply_handle_stream_abort(StringInfo s)
}
/*
- * Handle STREAM COMMIT message.
+ * Common spoolfile processing.
+ * Returns how many changes were applied.
*/
-static void
-apply_handle_stream_commit(StringInfo s)
+static int
+apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
{
- TransactionId xid;
StringInfoData s2;
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
bool found;
- LogicalRepCommitData commit_data;
StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
- Assert(!in_streamed_transaction);
-
- xid = logicalrep_read_stream_commit(s, &commit_data);
-
- elog(DEBUG1, "received commit for streamed transaction %u", xid);
-
- ensure_transaction();
-
/*
* Allocate file handle and memory required to process all the messages in
* TopTransactionContext to avoid them getting reset after each message is
@@ -981,7 +1194,7 @@ apply_handle_stream_commit(StringInfo s)
MemoryContextSwitchTo(oldcxt);
- remote_final_lsn = commit_data.commit_lsn;
+ remote_final_lsn = lsn;
/*
* Make sure the handle apply_dispatch methods are aware we're in a remote
@@ -1050,6 +1263,35 @@ apply_handle_stream_commit(StringInfo s)
BufFileClose(fd);
+ pfree(buffer);
+ pfree(s2.data);
+
+ elog(DEBUG1, "replayed %d (all) changes from file \"%s\"",
+ nchanges, path);
+
+ return nchanges;
+}
+
+/*
+ * Handle STREAM COMMIT message.
+ */
+static void
+apply_handle_stream_commit(StringInfo s)
+{
+ TransactionId xid;
+ LogicalRepCommitData commit_data;
+ int nchanges = 0;
+
+ Assert(!in_streamed_transaction);
+
+ xid = logicalrep_read_stream_commit(s, &commit_data);
+
+ elog(DEBUG1, "received commit for streamed transaction %u", xid);
+
+ ensure_transaction();
+
+ nchanges = apply_spooled_messages(xid, commit_data.commit_lsn);
+
/*
* Update origin state so we can restart streaming from correct position
* in case of crash.
@@ -1057,16 +1299,12 @@ apply_handle_stream_commit(StringInfo s)
replorigin_session_origin_lsn = commit_data.end_lsn;
replorigin_session_origin_timestamp = commit_data.committime;
- pfree(buffer);
- pfree(s2.data);
-
CommitTransactionCommand();
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
- elog(DEBUG1, "replayed %d (all) changes from file \"%s\"",
- nchanges, path);
+ elog(DEBUG1, "apply_handle_stream_commit: replayed %d (all) changes.", nchanges);
in_remote_transaction = false;
@@ -1905,10 +2143,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
@@ -1953,6 +2195,10 @@ apply_dispatch(StringInfo s)
case 'c':
apply_handle_stream_commit(s);
break;
+ /* STREAM PREPARE */
+ case 'p':
+ apply_handle_stream_prepare(s);
+ break;
default:
ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 9c997ae..b4f2c9a 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -57,7 +63,8 @@ static void pgoutput_stream_abort(struct LogicalDecodingContext *ctx,
static void pgoutput_stream_commit(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
-
+static void pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static bool publications_valid;
static bool in_streaming;
@@ -143,6 +150,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->rollback_prepared_cb = pgoutput_rollback_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -153,6 +164,8 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_commit_cb = pgoutput_stream_commit;
cb->stream_change_cb = pgoutput_change;
cb->stream_truncate_cb = pgoutput_truncate;
+ /* transaction streaming - two-phase commit */
+ cb->stream_prepare_cb = pgoutput_stream_prepare_txn;
}
static void
@@ -378,6 +391,48 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * ROLLBACK PREPARED callback
+ */
+static void
+pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
* Write the current schema of the relation and its ancestor (if any) if not
* done yet.
*/
@@ -857,6 +912,24 @@ pgoutput_stream_commit(struct LogicalDecodingContext *ctx,
}
/*
+ * PREPARE callback (for streaming two-phase commit).
+ *
+ * Notify the downstream to prepare the transaction.
+ */
+static void
+pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ Assert(rbtxn_is_streamed(txn));
+
+ OutputPluginUpdateProgress(ctx);
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_stream_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
* Initialize the relation schema sync cache for a decoding session.
*
* The hash table is destroyed at the end of a decoding session. While
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0c2cda2..ee38f89 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -87,6 +87,7 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
XLogRecPtr commit_lsn;
@@ -94,6 +95,28 @@ typedef struct LogicalRepCommitData
TimestampTz committime;
} LogicalRepCommitData;
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ROLLBACK] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ (((flags) == LOGICALREP_IS_PREPARE) || \
+ ((flags) == LOGICALREP_IS_COMMIT_PREPARED) || \
+ ((flags) == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
@@ -101,6 +124,10 @@ extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
@@ -144,4 +171,10 @@ extern void logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
extern void logicalrep_read_stream_abort(StringInfo in, TransactionId *xid,
TransactionId *subxid);
+extern void logicalrep_write_stream_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern TransactionId logicalrep_read_stream_prepare(StringInfo in,
+ LogicalRepPrepareData *prepare_data);
+
+
#endif /* LOGICAL_PROTO_H */
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..d0b0a25
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,345 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 21;
+
+###############################
+# Setup
+###############################
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf',
+ qq(max_prepared_transactions = 10));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf',
+ qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+ "ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2");
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres', "
+ CREATE SUBSCRIPTION tap_sub
+ CONNECTION '$publisher_connstr application_name=$appname'
+ PUBLICATION tap_pub");
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+ "SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+ "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+###############################
+# check that 2PC gets replicated to subscriber
+# then COMMIT PREPARED
+###############################
+
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (11);
+ PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a = 11;");
+is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+###############################
+# check that 2PC gets replicated to subscriber
+# then ROLLBACK PREPARED
+###############################
+
+$node_publisher->safe_psql('postgres',"
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a = 12;");
+is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+###############################
+# Check that ROLLBACK PREPARED is decoded properly on crash restart
+# (publisher and subscriber crash)
+###############################
+
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+
+$node_publisher->start;
+$node_subscriber->start;
+
+# rollback post the restart
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are rolled back
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (12,13);");
+is($result, qq(0), 'Rows inserted via 2PC are visible on the subscriber');
+
+###############################
+# Check that COMMIT PREPARED is decoded properly on crash restart
+# (publisher and subscriber crash)
+###############################
+
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (12,13);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+###############################
+# Check that COMMIT PREPARED is decoded properly on crash restart
+# (subscriber only crash)
+###############################
+
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (14);
+ INSERT INTO tab_full VALUES (15);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+
+$node_subscriber->stop('immediate');
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (14,15);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+###############################
+# Check that COMMIT PREPARED is decoded properly on crash restart
+# (publisher only crash)
+###############################
+
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (16);
+ INSERT INTO tab_full VALUES (17);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+
+$node_publisher->stop('immediate');
+$node_publisher->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (16,17);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+###############################
+# Test nested transaction with 2PC
+###############################
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (21);
+ SAVEPOINT sp_inner;
+ INSERT INTO tab_full VALUES (22);
+ ROLLBACK TO SAVEPOINT sp_inner;
+ PREPARE TRANSACTION 'outer';
+ ");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'outer';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# COMMIT
+$node_publisher->safe_psql('postgres', "
+ COMMIT PREPARED 'outer';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check the tx state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'outer';");
+is($result, qq(0), 'transaction is ended on subscriber');
+
+# check inserts are visible. 22 should be rolled back. 21 should be committed.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (21);");
+is($result, qq(1), 'Rows committed are on the subscriber');
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (22);");
+is($result, qq(0), 'Rows rolled back are not on the subscriber');
+
+###############################
+# Test using empty GID
+###############################
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (51);
+ PREPARE TRANSACTION '';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = '';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# ROLLBACK
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED '';");
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = '';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+###############################
+# Test cases involving DDL.
+###############################
+
+# TODO This can be added after we add functionality to replicate DDL changes to subscriber.
+
+###############################
+# check all the cleanup
+###############################
+
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
diff --git a/src/test/subscription/t/021_twophase_streaming.pl b/src/test/subscription/t/021_twophase_streaming.pl
new file mode 100644
index 0000000..3cfe3dc
--- /dev/null
+++ b/src/test/subscription/t/021_twophase_streaming.pl
@@ -0,0 +1,521 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 28;
+
+###############################
+# Test setup
+###############################
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_publisher->append_conf('postgresql.conf', qq(logical_decoding_work_mem = 64kB));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher (uses same DDL as 015_stream test)
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE test_tab (a int primary key, b varchar)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");
+
+# Setup structure on subscriber (columns a and b are compatible with same table name on publisher)
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+
+# Setup logical replication (streaming = on)
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres', "
+ CREATE SUBSCRIPTION tap_sub
+ CONNECTION '$publisher_connstr application_name=$appname'
+ PUBLICATION tap_pub
+ WITH (streaming = on)");
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+ "SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+ "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+###############################
+# Check initial data was copied to subscriber
+###############################
+my $result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check initial data was copied to subscriber');
+
+###############################
+# Test 2PC PREPARE / COMMIT PREPARED
+# 1. Data is streamed as a 2PC transaction.
+# 2. Then do commit prepared.
+#
+# Expect all data is replicated on subscriber side after the commit.
+###############################
+
+# check that 2PC gets replicated to subscriber
+# Insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# 2PC gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber, and extra columns contain local defaults');
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+###############################
+# Test 2PC PREPARE / ROLLBACK PREPARED.
+# 1. Table is deleted back to 2 rows which are replicated on subscriber.
+# 2. Data is streamed using 2PC
+# 3. Do rollback prepared.
+#
+# Expect data rolls back leaving only the original 2 rows.
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# 2PC tx gets aborted
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check initial data was copied to subscriber');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+###############################
+# Check that 2PC ROLLBACK PREPARED is decoded properly on crash restart.
+# 1. insert, update and delete enough rows to exceed the 64kB limit.
+# 2. Then server crashes before the 2PC transaction is rolled back.
+# 3. After servers are restarted the pending transaction is rolled back.
+#
+# Expect all inserted data is gone.
+# (Note: both publisher and subscriber crash/restart)
+###############################
+
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+
+$node_publisher->start;
+$node_subscriber->start;
+
+# rollback post the restart
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are rolled back
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'Rows inserted by 2PC are rolled back, leaving only the original 2 rows');
+
+###############################
+# Check that 2PC COMMIT PREPARED is decoded properly on crash restart.
+# 1. insert, update and delete enough rows to exceed the 64kB limit.
+# 2. Then server crashes before the 2PC transaction is committed.
+# 3. After servers are restarted the pending transaction is committed.
+#
+# Expect all data is replicated on subscriber side after the commit.
+# (Note: both publisher and subscriber crash/restart)
+###############################
+
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber, and extra columns contain local defaults');
+
+###############################
+# Check that 2PC COMMIT PREPARED is decoded properly on crash restart.
+# 1. insert, update and delete enough rows to exceed the 64kB limit.
+# 2. Then 1 server crashes before the 2PC transaction is committed.
+# 3. After servers are restarted the pending transaction is committed.
+#
+# Expect all data is replicated on subscriber side after the commit.
+# (Note: only subscriber crashes)
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# insert, update, delete enough data to cause streaming
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_subscriber->stop('immediate');
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber');
+
+###############################
+# Check that 2PC COMMIT PREPARED is decoded properly on crash restart.
+# 1. insert, update and delete enough rows to exceed the 64kB limit.
+# 2. Then 1 server crashes before the 2PC transaction is committed.
+# 3. After servers are restarted the pending transaction is committed.
+#
+# Expect all data is replicated on subscriber side after the commit.
+# (Note: only publisher crashes)
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# insert, update, delete enough data to cause streaming
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->stop('immediate');
+$node_publisher->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber');
+
+###############################
+# Do INSERT after the PREPARE but before ROLLBACK PREPARED.
+# 1. Table is deleted back to 2 rows which are replicated on subscriber.
+# 2. Data is streamed using 2PC.
+# 3. A single row INSERT is done which is after the PREPARE
+# 4. Then do a ROLLBACK PREPARED.
+#
+# Expect the 2PC data rolls back leaving only 3 rows on the subscriber.
+# (the original 2 + inserted 1)
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# Insert a different record (now we are outside of the 2PC tx)
+# Note: the 2PC tx still holds row locks so make sure this insert is for a separate primary key
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (99999, 'foobar')");
+
+# 2PC tx gets aborted
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber,
+# but the extra INSERT outside of the 2PC still was replicated
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3|3|3), 'check the outside insert was copied to subscriber');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+###############################
+# Do INSERT after the PREPARE but before COMMIT PREPARED.
+# 1. Table is deleted back to 2 rows which are replicated on subscriber.
+# 2. Data is streamed using 2PC.
+# 3. A single row INSERT is done which is after the PREPARE.
+# 4. Then do a COMMIT PREPARED.
+#
+# Expect 2PC data + the extra row are on the subscriber.
+# (the 3334 + inserted 1 = 3335)
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# Insert a different record (now we are outside of the 2PC tx)
+# Note: the 2PC tx still holds row locks so make sure this insert is for a separare primary key
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (99999, 'foobar')");
+
+# 2PC tx gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3335|3335|3335), 'Rows inserted by 2PC (as well as outside insert) have committed on subscriber, and extra columns contain local defaults');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+###############################
+# Do DELETE after PREPARE but before COMMIT PREPARED.
+# 1. Table is deleted back to 2 rows which are replicated on subscriber.
+# 2. Data is streamed using 2PC.
+# 3. A single row DELETE is done for one of the records that was inserted by the 2PC transaction
+# 4. Then there is a COMMIT PREPARED.
+#
+# Expect all the 2PC data rows on the subscriber (since in fact delete at step 3 would do nothing
+# because that record was not yet committed at the time of the delete).
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# DELETE one of the prepared 2PC records before they get committed (we are outside of the 2PC tx)
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM test_tab WHERE a = 5");
+
+# 2PC tx gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber. Nothing was deleted');
+
+# confirm the "deleted" row was in fact not deleted
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM test_tab WHERE a = 5");
+is($result, qq(1), 'The row we deleted before the commit till exists on subscriber.');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+###############################
+# Try 2PC tx works using an empty GID literal
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION '';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = '';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# 2PC tx gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED '';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber');
+
+###############################
+# Test cases involving DDL
+###############################
+
+# TODO This can be added after we add functionality to replicate DDL changes to subscriber.
+
+###############################
+# check all the cleanup
+###############################
+
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
Hi Ajin.
I checked the to see how my previous review comments (of v10) were
addressed by the latest patches (currently v12)
There are a couple of remaining items.
---
====================
v12-0001. File: doc/src/sgml/logicaldecoding.sgml
====================
COMMENT
Section 49.6.1
Says:
An output plugin may also define functions to support streaming of
large, in-progress transactions. The stream_start_cb, stream_stop_cb,
stream_abort_cb, stream_commit_cb, stream_change_cb, and
stream_prepare_cb are required, while stream_message_cb and
stream_truncate_cb are optional.
An output plugin may also define functions to support two-phase
commits, which are decoded on PREPARE TRANSACTION. The prepare_cb,
commit_prepared_cb and rollback_prepared_cb callbacks are required,
while filter_prepare_cb is optional.
~
I was not sure how the paragraphs are organised. e.g. 1st seems to be
about streams and 2nd seems to be about two-phase commit. But they are
not mutually exclusive, so I guess I thought it was odd that
stream_prepare_cb was not mentioned in the 2nd paragraph.
Or maybe it is OK as-is?
====================
v12-0002. File: contrib/test_decoding/expected/two_phase.out
====================
COMMENT
Line 26
PREPARE TRANSACTION 'test_prepared#1';
--
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'two-phase-commit', '1', 'include-xids', '0',
'skip-empty-xacts', '1');
~
Seems like a missing comment to explain the expectation of that select.
---
COMMENT
Line 80
-- The insert should show the newly altered column.
~
Do you also need to mention something about the DDL not being present
in the decoding?
====================
v12-0002. File: src/backend/replication/logical/reorderbuffer.c
====================
COMMENT
Line 1807
/* Here we are streaming and part of the PREPARE of a two-phase commit
* The full cleanup will happen as part of the COMMIT PREPAREDs, so now
* just truncate txn by removing changes and tuple_cids
*/
~
Something seems strange about the first sentence of that comment
---
COMMENT
Line 1944
/* Discard the changes that we just streamed.
* This can only be called if streaming and not part of a PREPARE in
* a two-phase commit, so set prepared flag as false.
*/
~
I thought since this comment that is asserting various things, that
should also actually be written as code Assert.
---
COMMENT
Line 2401
/*
* We are here due to one of the 3 scenarios:
* 1. As part of streaming in-progress transactions
* 2. Prepare of a two-phase commit
* 3. Commit of a transaction.
*
* If we are streaming the in-progress transaction then discard the
* changes that we just streamed, and mark the transactions as
* streamed (if they contained changes), set prepared flag as false.
* If part of a prepare of a two-phase commit set the prepared flag
* as true so that we can discard changes and cleanup tuplecids.
* Otherwise, remove all the
* changes and deallocate the ReorderBufferTXN.
*/
~
The above comment is beyond my understanding. Anything you could do to
simplify it would be good.
For example, when viewing this function in isolation I have never
understood why the streaming flag and rbtxn_prepared(txn) flag are not
possible to be set at the same time?
Perhaps the code is relying on just internal knowledge of how this
helper function gets called? And if it is just that, then IMO there
really should be some Asserts in the code to give more assurance about
that. (Or maybe use completely different flags to represent those 3
scenarios instead of bending the meanings of the existing flags)
====================
v12-0003. File: src/backend/access/transam/twophase.c
====================
COMMENT
Line 557
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+ bool found = false;
~
Alignment of the variable declarations in LookupGXact function
---
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Oct 26, 2020 at 6:49 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Ajin.
I checked the to see how my previous review comments (of v10) were
addressed by the latest patches (currently v12)There are a couple of remaining items.
---
====================
v12-0001. File: doc/src/sgml/logicaldecoding.sgml
====================COMMENT
Section 49.6.1
Says:
An output plugin may also define functions to support streaming of
large, in-progress transactions. The stream_start_cb, stream_stop_cb,
stream_abort_cb, stream_commit_cb, stream_change_cb, and
stream_prepare_cb are required, while stream_message_cb and
stream_truncate_cb are optional.An output plugin may also define functions to support two-phase
commits, which are decoded on PREPARE TRANSACTION. The prepare_cb,
commit_prepared_cb and rollback_prepared_cb callbacks are required,
while filter_prepare_cb is optional.
~
I was not sure how the paragraphs are organised. e.g. 1st seems to be
about streams and 2nd seems to be about two-phase commit. But they are
not mutually exclusive, so I guess I thought it was odd that
stream_prepare_cb was not mentioned in the 2nd paragraph.Or maybe it is OK as-is?
I've added stream_prepare_cb to the 2nd paragraph as well.
====================
v12-0002. File: contrib/test_decoding/expected/two_phase.out
====================COMMENT
Line 26
PREPARE TRANSACTION 'test_prepared#1';
--
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'two-phase-commit', '1', 'include-xids', '0',
'skip-empty-xacts', '1');
~
Seems like a missing comment to explain the expectation of that select.---
Updated.
COMMENT
Line 80
-- The insert should show the newly altered column.
~
Do you also need to mention something about the DDL not being present
in the decoding?
Updated.
====================
v12-0002. File: src/backend/replication/logical/reorderbuffer.c
====================COMMENT
Line 1807
/* Here we are streaming and part of the PREPARE of a two-phase commit
* The full cleanup will happen as part of the COMMIT PREPAREDs, so now
* just truncate txn by removing changes and tuple_cids
*/
~
Something seems strange about the first sentence of that comment---
COMMENT
Line 1944
/* Discard the changes that we just streamed.
* This can only be called if streaming and not part of a PREPARE in
* a two-phase commit, so set prepared flag as false.
*/
~
I thought since this comment that is asserting various things, that
should also actually be written as code Assert.---
Added an assert.
COMMENT
Line 2401
/*
* We are here due to one of the 3 scenarios:
* 1. As part of streaming in-progress transactions
* 2. Prepare of a two-phase commit
* 3. Commit of a transaction.
*
* If we are streaming the in-progress transaction then discard the
* changes that we just streamed, and mark the transactions as
* streamed (if they contained changes), set prepared flag as false.
* If part of a prepare of a two-phase commit set the prepared flag
* as true so that we can discard changes and cleanup tuplecids.
* Otherwise, remove all the
* changes and deallocate the ReorderBufferTXN.
*/
~
The above comment is beyond my understanding. Anything you could do to
simplify it would be good.For example, when viewing this function in isolation I have never
understood why the streaming flag and rbtxn_prepared(txn) flag are not
possible to be set at the same time?Perhaps the code is relying on just internal knowledge of how this
helper function gets called? And if it is just that, then IMO there
really should be some Asserts in the code to give more assurance about
that. (Or maybe use completely different flags to represent those 3
scenarios instead of bending the meanings of the existing flags)
Left this for now, probably re-look at this at a later review.
But just to explain; this function is what does the main decoding of
changes of a transaction.
At what point this decoding happens is what this feature and the
streaming in-progress feature is about. As of PG13, this decoding only
happens at commit time. With the streaming of in-progress txn feature,
this began to happen (if streaming enabled) at the time when the
memory limit for decoding transactions was crossed. This 2PC feature
is supporting decoding at the time of a PREPARE transaction.
Now, if streaming is enabled and streaming has started as a result of
crossing the memory threshold, then there is no need to
again begin streaming at a PREPARE transaction as the transaction that
is being prepared has already been streamed. Which is why this
function will not be called when a streaming transaction is prepared
as part of a two-phase commit.
====================
v12-0003. File: src/backend/access/transam/twophase.c
====================COMMENT
Line 557
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}/* + * LookupGXact + * Check if the prepared transaction with the given GID is around + */ +bool +LookupGXact(const char *gid) +{ + int i; + bool found = false; ~ Alignment of the variable declarations in LookupGXact function---
Updated.
Amit, I have also updated your comment about removing function
declaration from commit 1 and I've added it to commit 2. Also removed
whitespace errors.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v13-0001-Support-2PC-txn-base.patchapplication/octet-stream; name=v13-0001-Support-2PC-txn-base.patchDownload
From 69c8c31ce3fec377957bee7411136d7eb0d61628 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 27 Oct 2020 05:05:22 -0400
Subject: [PATCH v13] Support-2PC-txn-base.
Until now two-phase transaction commands were translated into regular transactions
on the subscriber, and the GID was not forwarded to it. None of the two-phase semantics
were communicated to the subscriber.
This patch provides infrastructure for logical decoding plugins to be informed of
two-phase commands Like PREPARE TRANSACTION, COMMIT PREPARED
and ROLLBACK PREPARED commands with the corresponding GID.
Include logical decoding plugin API infrastructure changes.
Includes contrib/test_decoding changes.
Includes documentation changes.
---
contrib/test_decoding/test_decoding.c | 192 +++++++++++++++++++++++++
doc/src/sgml/logicaldecoding.sgml | 145 ++++++++++++++++++-
src/backend/replication/logical/logical.c | 228 ++++++++++++++++++++++++++++++
src/include/replication/logical.h | 5 +
src/include/replication/output_plugin.h | 46 ++++++
src/include/replication/reorderbuffer.h | 56 ++++++++
6 files changed, 665 insertions(+), 7 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 8e33614..7eb13ce 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -11,12 +11,16 @@
*-------------------------------------------------------------------------
*/
#include "postgres.h"
+#include "miscadmin.h"
+#include "access/transam.h"
#include "catalog/pg_type.h"
#include "replication/logical.h"
#include "replication/origin.h"
+#include "storage/procarray.h"
+
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid_aborted; /* track abort of this txid */
} TestDecodingData;
static void pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -73,6 +78,9 @@ static void pg_decode_stream_stop(LogicalDecodingContext *ctx,
static void pg_decode_stream_abort(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
static void pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
@@ -88,6 +96,18 @@ static void pg_decode_stream_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
int nrelations, Relation relations[],
ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
void
_PG_init(void)
@@ -112,10 +132,15 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_start_cb = pg_decode_stream_start;
cb->stream_stop_cb = pg_decode_stream_stop;
cb->stream_abort_cb = pg_decode_stream_abort;
+ cb->stream_prepare_cb = pg_decode_stream_prepare;
cb->stream_commit_cb = pg_decode_stream_commit;
cb->stream_change_cb = pg_decode_stream_change;
cb->stream_message_cb = pg_decode_stream_message;
cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->rollback_prepared_cb = pg_decode_rollback_prepared_txn;
}
@@ -127,6 +152,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
ListCell *option;
TestDecodingData *data;
bool enable_streaming = false;
+ bool enable_2pc = false;
data = palloc0(sizeof(TestDecodingData));
data->context = AllocSetContextCreate(ctx->context,
@@ -136,6 +162,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
data->include_timestamp = false;
data->skip_empty_xacts = false;
data->only_local = false;
+ data->check_xid_aborted = InvalidTransactionId;
ctx->output_plugin_private = data;
@@ -227,6 +254,40 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
errmsg("could not parse value \"%s\" for parameter \"%s\"",
strVal(elem->arg), elem->defname)));
}
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;
+ else if (!parse_bool(strVal(elem->arg), &enable_2pc))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not parse value \"%s\" for parameter \"%s\"",
+ strVal(elem->arg), elem->defname)));
+ }
+ else if (strcmp(elem->defname, "check-xid-aborted") == 0)
+ {
+ if (elem->arg == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted needs an input value")));
+ else
+ {
+ long xid;
+
+ errno = 0;
+ xid = strtoul(strVal(elem->arg), NULL, 0);
+ if (xid == 0 || errno != 0)
+ data->check_xid_aborted = InvalidTransactionId;
+ else
+ data->check_xid_aborted = (TransactionId)xid;
+
+ if (!TransactionIdIsValid(data->check_xid_aborted))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check-xid-aborted is not a valid xid: \"%s\"",
+ strVal(elem->arg))));
+ }
+ }
else
{
ereport(ERROR,
@@ -238,6 +299,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
ctx->streaming &= enable_streaming;
+ ctx->twophase &= enable_2pc;
}
/* cleanup this plugin's resources */
@@ -297,6 +359,93 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
OutputPluginWrite(ctx, true);
}
+/*
+ * Filter out two-phase transactions.
+ *
+ * Each plugin can implement its own filtering logic. Here
+ * we demonstrate a simple logic by checking the GID. If the
+ * GID contains the "_nodecode" substring, then we filter
+ * it out.
+ */
+static bool
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ if (strstr(gid, "_nodecode") != NULL)
+ return true;
+
+ return false;
+}
+
+/* PREPARE callback */
+static void
+pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "PREPARE TRANSACTION %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* COMMIT PREPARED callback */
+static void
+pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "COMMIT PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+/* ROLLBACK PREPARED callback */
+static void
+pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ appendStringInfo(ctx->out, "ROLLBACK PREPARED %s",
+ quote_literal_cstr(txn->gid));
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, " %u", txn->xid);
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
static bool
pg_decode_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id)
@@ -455,6 +604,25 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
data->xact_wrote_changes = true;
+ /* if check_xid_aborted is a valid xid, then it was passed in
+ * as an option to check if the transaction having this xid would be aborted.
+ * This is to test concurrent aborts.
+ */
+ if (TransactionIdIsValid(data->check_xid_aborted))
+ {
+ elog(LOG, "waiting for %u to abort", data->check_xid_aborted);
+ while (TransactionIdIsInProgress(data->check_xid_aborted))
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ if (!TransactionIdIsInProgress(data->check_xid_aborted) &&
+ !TransactionIdDidCommit(data->check_xid_aborted))
+ elog(LOG, "%u aborted", data->check_xid_aborted);
+
+ Assert(TransactionIdDidAbort(data->check_xid_aborted));
+ }
+
class_form = RelationGetForm(relation);
tupdesc = RelationGetDescr(relation);
@@ -646,6 +814,30 @@ pg_decode_stream_abort(LogicalDecodingContext *ctx,
}
static void
+pg_decode_stream_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ TestDecodingData *data = ctx->output_plugin_private;
+
+ if (data->skip_empty_xacts && !data->xact_wrote_changes)
+ return;
+
+ OutputPluginPrepareWrite(ctx, true);
+
+ if (data->include_xids)
+ appendStringInfo(ctx->out, "preparing streamed transaction TXN %u", txn->xid);
+ else
+ appendStringInfo(ctx->out, "preparing streamed transaction");
+
+ if (data->include_timestamp)
+ appendStringInfo(ctx->out, " (at %s)",
+ timestamptz_to_str(txn->commit_time));
+
+ OutputPluginWrite(ctx, true);
+}
+
+static void
pg_decode_stream_commit(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 813a037..ad8991d 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -387,11 +387,16 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
@@ -413,10 +418,19 @@ typedef void (*LogicalOutputPluginInit) (struct OutputPluginCallbacks *cb);
An output plugin may also define functions to support streaming of large,
in-progress transactions. The <function>stream_start_cb</function>,
<function>stream_stop_cb</function>, <function>stream_abort_cb</function>,
- <function>stream_commit_cb</function> and <function>stream_change_cb</function>
+ <function>stream_commit_cb</function>, <function>stream_change_cb</function>,
+ and <function>stream_prepare_cb</function>
are required, while <function>stream_message_cb</function> and
<function>stream_truncate_cb</function> are optional.
</para>
+
+ <para>
+ An output plugin may also define functions to support two-phase commits, which are
+ decoded on <command>PREPARE TRANSACTION</command>. The <function>prepare_cb</function>,
+ <function>stream_prepare_cb</function>, <function>commit_prepared_cb</function>
+ and <function>rollback_prepared_cb</function>
+ callbacks are required, while <function>filter_prepare_cb</function> is optional.
+ </para>
</sect2>
<sect2 id="logicaldecoding-capabilities">
@@ -477,7 +491,13 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
never get
decoded. Successful savepoints are
folded into the transaction containing them in the order they were
- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.
</para>
<note>
@@ -578,6 +598,55 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-prepare">
+ <title>Transaction Prepare Callback</title>
+
+ <para>
+ The required <function>prepare_cb</function> callback is called whenever
+ a transaction which is prepared for two-phase commit has been
+ decoded. The <function>change_cb</function> callbacks for all modified
+ rows will have been called before this, if there have been any modified
+ rows.
+<programlisting>
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-commit-prepared">
+ <title>Transaction Commit Prepared Callback</title>
+
+ <para>
+ The required <function>commit_prepared_cb</function> callback is called whenever
+ a transaction commit prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
+ <sect3 id="logicaldecoding-output-plugin-rollback-prepared">
+ <title>Transaction Rollback Prepared Callback</title>
+
+ <para>
+ The required <function>rollback_prepared_cb</function> callback is called whenever
+ a transaction rollback prepared has been decoded. The <parameter>gid</parameter> field,
+ which is part of the <parameter>txn</parameter> parameter can be used in this
+ callback.
+<programlisting>
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr rollback_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-change">
<title>Change Callback</title>
@@ -587,7 +656,13 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
an <command>INSERT</command>, <command>UPDATE</command>,
or <command>DELETE</command>. Even if the original command modified
several rows at once the callback will be called individually for each
- row.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
<programlisting>
typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
@@ -664,6 +739,39 @@ typedef bool (*LogicalDecodeFilterByOriginCB) (struct LogicalDecodingContext *ct
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-filter-prepare">
+ <title>Prepare Filter Callback</title>
+
+ <para>
+ The optional <function>filter_prepare_cb</function> callback
+ is called to determine whether data that is part of the current
+ two-phase commit transaction should be considered for decode
+ at this prepare stage or as a regular one-phase transaction at
+ <command>COMMIT PREPARED</command> time later. To signal that
+ decoding should be skipped, return <literal>true</literal>;
+ <literal>false</literal> otherwise. When the callback is not
+ defined, <literal>false</literal> is assumed (i.e. nothing is
+ filtered).
+<programlisting>
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+</programlisting>
+ The <parameter>ctx</parameter> parameter has the same contents
+ as for the other callbacks. The <parameter>txn</parameter> parameter
+ contains meta information about the transaction. The <parameter>xid</parameter>
+ contains the XID because <parameter>txn</parameter> can be NULL in some cases.
+ The <parameter>gid</parameter> is the identifier that later identifies this
+ transaction for <command>COMMIT PREPARED</command> or <command>ROLLBACK PREPARED</command>.
+ </para>
+ <para>
+ The callback has to provide the same static answer for a given combination of
+ <parameter>xid</parameter> and <parameter>gid</parameter> every time it is
+ called.
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-message">
<title>Generic Message Callback</title>
@@ -685,7 +793,13 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
non-transactional and the XID was not assigned yet in the transaction
which logged the message. The <parameter>lsn</parameter> has WAL
location of the message. The <parameter>transactional</parameter> says
- if the message was sent as transactional or not.
+ if the message was sent as transactional or not. Similar to the change
+ callback, in case of decoding a prepared (but yet uncommitted)
+ transaction or decoding of an uncommitted transaction, this message
+ callback might also error out due to simultaneous rollback of
+ this very same transaction. In that case, the logical decoding of this
+ aborted transaction is stopped gracefully.
+
The <parameter>prefix</parameter> is arbitrary null-terminated prefix
which can be used for identifying interesting messages for the current
plugin. And finally the <parameter>message</parameter> parameter holds
@@ -735,6 +849,19 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
</para>
</sect3>
+ <sect3 id="logicaldecoding-output-plugin-stream-prepare">
+ <title>Stream Prepare Callback</title>
+ <para>
+ The <function>stream_prepare_cb</function> callback is called to prepare
+ a previously streamed transaction as part of a two-phase commit.
+<programlisting>
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+</programlisting>
+ </para>
+ </sect3>
+
<sect3 id="logicaldecoding-output-plugin-stream-commit">
<title>Stream Commit Callback</title>
<para>
@@ -913,9 +1040,13 @@ OutputPluginWrite(ctx, true);
When streaming an in-progress transaction, the changes (and messages) are
streamed in blocks demarcated by <function>stream_start_cb</function>
and <function>stream_stop_cb</function> callbacks. Once all the decoded
- changes are transmitted, the transaction is committed using the
- <function>stream_commit_cb</function> callback (or possibly aborted using
- the <function>stream_abort_cb</function> callback).
+ changes are transmitted, the transaction can be committed using the
+ the <function>stream_commit_cb</function> callback
+ (or possibly aborted using the <function>stream_abort_cb</function> callback).
+ If two-phase commits are supported, the transaction can be prepared using the
+ <function>stream_prepare_cb</function> callback, commit prepared using the
+ <function>commit_prepared_cb</function> callback or aborted using the
+ <function>rollback_prepared_cb</function>.
</para>
<para>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 8675832..0f605ef 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -59,6 +59,14 @@ static void shutdown_cb_wrapper(LogicalDecodingContext *ctx);
static void begin_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn);
static void commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+static bool filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid);
+static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change);
static void truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -74,6 +82,8 @@ static void stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr last_lsn);
static void stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+static void stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
static void stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
static void stream_change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -207,6 +217,10 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->apply_change = change_cb_wrapper;
ctx->reorder->apply_truncate = truncate_cb_wrapper;
ctx->reorder->commit = commit_cb_wrapper;
+ ctx->reorder->filter_prepare = filter_prepare_cb_wrapper;
+ ctx->reorder->prepare = prepare_cb_wrapper;
+ ctx->reorder->commit_prepared = commit_prepared_cb_wrapper;
+ ctx->reorder->rollback_prepared = rollback_prepared_cb_wrapper;
ctx->reorder->message = message_cb_wrapper;
/*
@@ -221,12 +235,26 @@ StartupDecodingContext(List *output_plugin_options,
ctx->streaming = (ctx->callbacks.stream_start_cb != NULL) ||
(ctx->callbacks.stream_stop_cb != NULL) ||
(ctx->callbacks.stream_abort_cb != NULL) ||
+ (ctx->callbacks.stream_prepare_cb != NULL) ||
(ctx->callbacks.stream_commit_cb != NULL) ||
(ctx->callbacks.stream_change_cb != NULL) ||
(ctx->callbacks.stream_message_cb != NULL) ||
(ctx->callbacks.stream_truncate_cb != NULL);
/*
+ * To support two-phase logical decoding, we require prepare/commit-prepare/abort-prepare
+ * callbacks. The filter-prepare callback is optional. We however enable two-phase logical
+ * decoding when at least one of the methods is enabled so that we can easily identify
+ * missing methods.
+ *
+ * We decide it here, but only check it later in the wrappers.
+ */
+ ctx->twophase = (ctx->callbacks.prepare_cb != NULL) ||
+ (ctx->callbacks.commit_prepared_cb != NULL) ||
+ (ctx->callbacks.rollback_prepared_cb != NULL) ||
+ (ctx->callbacks.filter_prepare_cb != NULL);
+
+ /*
* streaming callbacks
*
* stream_message and stream_truncate callbacks are optional, so we do not
@@ -237,6 +265,7 @@ StartupDecodingContext(List *output_plugin_options,
ctx->reorder->stream_start = stream_start_cb_wrapper;
ctx->reorder->stream_stop = stream_stop_cb_wrapper;
ctx->reorder->stream_abort = stream_abort_cb_wrapper;
+ ctx->reorder->stream_prepare = stream_prepare_cb_wrapper;
ctx->reorder->stream_commit = stream_commit_cb_wrapper;
ctx->reorder->stream_change = stream_change_cb_wrapper;
ctx->reorder->stream_message = stream_message_cb_wrapper;
@@ -783,6 +812,120 @@ commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "prepare";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin supports two-phase commits then prepare callback is mandatory */
+ if (ctx->callbacks.prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register prepare_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "commit_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support two-phase commits then commit prepared callback is mandatory */
+ if (ctx->callbacks.commit_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register commit_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
+rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ /* We're only supposed to call this when two-phase commits are supported */
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "rollback_prepared";
+ state.report_location = txn->final_lsn; /* beginning of commit record */
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn; /* points to the end of the record */
+
+ /* If the plugin support two-phase commits then abort prepared callback is mandatory */
+ if (ctx->callbacks.rollback_prepared_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Output plugin did not register rollback_prepared_cb callback")));
+
+ /* do the actual work: call callback */
+ ctx->callbacks.rollback_prepared_cb(ctx, txn, abort_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
@@ -859,6 +1002,51 @@ truncate_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
error_context_stack = errcallback.previous;
}
+static bool
+filter_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ TransactionId xid, const char *gid)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+ bool ret;
+
+ /*
+ * Skip if decoding of two-phase transactions at PREPARE time is not enabled. In that
+ * case all two-phase transactions are considered filtered out and will be
+ * applied as regular transactions at COMMIT PREPARED.
+ */
+ if (!ctx->twophase)
+ return true;
+
+ /*
+ * The filter_prepare callback is optional. When not supplied, all
+ * prepared transactions should go through.
+ */
+ if (!ctx->callbacks.filter_prepare_cb)
+ return false;
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "filter_prepare";
+ state.report_location = InvalidXLogRecPtr;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = false;
+
+ /* do the actual work: call callback */
+ ret = ctx->callbacks.filter_prepare_cb(ctx, txn, xid, gid);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ return ret;
+}
+
bool
filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id)
{
@@ -1057,6 +1245,46 @@ stream_abort_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
}
static void
+stream_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ LogicalDecodingContext *ctx = cache->private_data;
+ LogicalErrorCallbackState state;
+ ErrorContextCallback errcallback;
+
+ Assert(!ctx->fast_forward);
+
+ /* We're only supposed to call this when streaming and two-phase commits are supported. */
+ Assert(ctx->streaming);
+ Assert(ctx->twophase);
+
+ /* Push callback + info on the error context stack */
+ state.ctx = ctx;
+ state.callback_name = "stream_prepare";
+ state.report_location = txn->final_lsn;
+ errcallback.callback = output_plugin_error_callback;
+ errcallback.arg = (void *) &state;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* set output state */
+ ctx->accept_writes = true;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = txn->end_lsn;
+
+ /* in streaming mode with two-phase commits, stream_prepare_cb is required */
+ if (ctx->callbacks.stream_prepare_cb == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical streaming commits requires a stream_prepare_cb callback")));
+
+ ctx->callbacks.stream_prepare_cb(ctx, txn, prepare_lsn);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+}
+
+static void
stream_commit_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 40bab7e..7f4384b 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -85,6 +85,11 @@ typedef struct LogicalDecodingContext
bool streaming;
/*
+ * Does the output plugin support two-phase decoding, and is it enabled?
+ */
+ bool twophase;
+
+ /*
* State for writing output.
*/
bool accept_writes;
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index b78c796..4c1341f 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -77,6 +77,39 @@ typedef void (*LogicalDecodeCommitCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+ /*
+ * Called before decoding of PREPARE record to decide whether this
+ * transaction should be decoded with separate calls to prepare and
+ * commit_prepared/rollback_prepared callbacks or wait till COMMIT PREPARED and
+ * sent as usual transaction.
+ */
+typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/*
+ * Called for PREPARE record unless it was filtered by filter_prepare()
+ * callback.
+ */
+typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
+ * Called for COMMIT PREPARED.
+ */
+typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/*
+ * Called for ROLLBACK PREPARED.
+ */
+typedef void (*LogicalDecodeRollbackPreparedCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/*
* Called for the generic logical decoding messages.
*/
@@ -124,6 +157,14 @@ typedef void (*LogicalDecodeStreamAbortCB) (struct LogicalDecodingContext *ctx,
XLogRecPtr abort_lsn);
/*
+ * Called to prepare changes streamed to remote node from in-progress
+ * transaction. This is called as part of a two-phase commit.
+ */
+typedef void (*LogicalDecodeStreamPrepareCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/*
* Called to apply changes streamed to remote node from in-progress
* transaction.
*/
@@ -171,12 +212,17 @@ typedef struct OutputPluginCallbacks
LogicalDecodeTruncateCB truncate_cb;
LogicalDecodeCommitCB commit_cb;
LogicalDecodeMessageCB message_cb;
+ LogicalDecodeFilterPrepareCB filter_prepare_cb;
+ LogicalDecodePrepareCB prepare_cb;
+ LogicalDecodeCommitPreparedCB commit_prepared_cb;
+ LogicalDecodeRollbackPreparedCB rollback_prepared_cb;
LogicalDecodeFilterByOriginCB filter_by_origin_cb;
LogicalDecodeShutdownCB shutdown_cb;
/* streaming of changes */
LogicalDecodeStreamStartCB stream_start_cb;
LogicalDecodeStreamStopCB stream_stop_cb;
LogicalDecodeStreamAbortCB stream_abort_cb;
+ LogicalDecodeStreamPrepareCB stream_prepare_cb;
LogicalDecodeStreamCommitCB stream_commit_cb;
LogicalDecodeStreamChangeCB stream_change_cb;
LogicalDecodeStreamMessageCB stream_message_cb;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1c77819..1d9bfb0 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -10,6 +10,7 @@
#define REORDERBUFFER_H
#include "access/htup_details.h"
+#include "access/twophase.h"
#include "lib/ilist.h"
#include "storage/sinval.h"
#include "utils/hsearch.h"
@@ -174,6 +175,9 @@ typedef struct ReorderBufferChange
#define RBTXN_IS_STREAMED 0x0010
#define RBTXN_HAS_TOAST_INSERT 0x0020
#define RBTXN_HAS_SPEC_INSERT 0x0040
+#define RBTXN_PREPARE 0x0080
+#define RBTXN_COMMIT_PREPARED 0x0100
+#define RBTXN_ROLLBACK_PREPARED 0x0200
/* Does the transaction have catalog changes? */
#define rbtxn_has_catalog_changes(txn) \
@@ -233,6 +237,24 @@ typedef struct ReorderBufferChange
((txn)->txn_flags & RBTXN_IS_STREAMED) != 0 \
)
+/* Has this transaction been prepared? */
+#define rbtxn_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_PREPARE) != 0 \
+)
+
+/* Has this prepared transaction been committed? */
+#define rbtxn_commit_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_COMMIT_PREPARED) != 0 \
+)
+
+/* Has this prepared transaction been rollbacked? */
+#define rbtxn_rollback_prepared(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_ROLLBACK_PREPARED) != 0 \
+)
+
typedef struct ReorderBufferTXN
{
/* See above */
@@ -244,6 +266,9 @@ typedef struct ReorderBufferTXN
/* Xid of top-level transaction, if known */
TransactionId toplevel_xid;
+ /* In case of two-phase commit we need to pass GID to output plugin */
+ char *gid;
+
/*
* LSN of the first data carrying, WAL record with knowledge about this
* xid. This is allowed to *not* be first record adorned with this xid, if
@@ -405,6 +430,26 @@ typedef void (*ReorderBufferCommitCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
+typedef bool (*ReorderBufferFilterPrepareCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ TransactionId xid,
+ const char *gid);
+
+/* prepare callback signature */
+typedef void (*ReorderBufferPrepareCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
+/* commit prepared callback signature */
+typedef void (*ReorderBufferCommitPreparedCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn);
+
+/* rollback prepared callback signature */
+typedef void (*ReorderBufferRollbackPreparedCB) (ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
+
/* message callback signature */
typedef void (*ReorderBufferMessageCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
@@ -431,6 +476,12 @@ typedef void (*ReorderBufferStreamAbortCB) (
ReorderBufferTXN *txn,
XLogRecPtr abort_lsn);
+/* prepare streamed transaction callback signature */
+typedef void (*ReorderBufferStreamPrepareCB) (
+ ReorderBuffer *rb,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+
/* commit streamed transaction callback signature */
typedef void (*ReorderBufferStreamCommitCB) (
ReorderBuffer *rb,
@@ -497,6 +548,10 @@ struct ReorderBuffer
ReorderBufferApplyChangeCB apply_change;
ReorderBufferApplyTruncateCB apply_truncate;
ReorderBufferCommitCB commit;
+ ReorderBufferFilterPrepareCB filter_prepare;
+ ReorderBufferPrepareCB prepare;
+ ReorderBufferCommitPreparedCB commit_prepared;
+ ReorderBufferRollbackPreparedCB rollback_prepared;
ReorderBufferMessageCB message;
/*
@@ -505,6 +560,7 @@ struct ReorderBuffer
ReorderBufferStreamStartCB stream_start;
ReorderBufferStreamStopCB stream_stop;
ReorderBufferStreamAbortCB stream_abort;
+ ReorderBufferStreamPrepareCB stream_prepare;
ReorderBufferStreamCommitCB stream_commit;
ReorderBufferStreamChangeCB stream_change;
ReorderBufferStreamMessageCB stream_message;
--
1.8.3.1
v13-0002-Support-2PC-txn-backend-and-tests.patchapplication/octet-stream; name=v13-0002-Support-2PC-txn-backend-and-tests.patchDownload
From b6bda03755fa60ae52ce8c77c2f4970dff97b316 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 27 Oct 2020 05:19:18 -0400
Subject: [PATCH v13] Support 2PC txn backend and tests.
Until now two-phase transactions were decoded at COMMIT, just like
regular transaction. During replay, two-phase transactions were
translated into regular transactions on the subscriber, and the GID
was not forwarded to it.
This patch allows PREPARE-time decoding two-phase transactions (if
the output plugin supports this capability), in which case the
transactions are replayed at PREPARE and then committed later when
COMMIT PREPARED arrives.
Includes backend changes to support decoding of PREPARE TRANSACTION,
COMMIT PREPARED and ROLLBACK PREPARED.
Includes two-phase commit test code (for test_decoding).
---
contrib/test_decoding/Makefile | 4 +-
contrib/test_decoding/expected/two_phase.out | 228 ++++++++++++++++++
contrib/test_decoding/sql/two_phase.sql | 119 ++++++++++
contrib/test_decoding/t/001_twophase.pl | 121 ++++++++++
src/backend/replication/logical/decode.c | 250 ++++++++++++++++++-
src/backend/replication/logical/reorderbuffer.c | 303 +++++++++++++++++++++---
src/include/replication/reorderbuffer.h | 14 ++
7 files changed, 987 insertions(+), 52 deletions(-)
create mode 100644 contrib/test_decoding/expected/two_phase.out
create mode 100644 contrib/test_decoding/sql/two_phase.sql
create mode 100644 contrib/test_decoding/t/001_twophase.pl
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a4c76f..2abf3ce 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -4,11 +4,13 @@ MODULES = test_decoding
PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
- decoding_into_rel binary prepared replorigin time messages \
+ decoding_into_rel binary prepared replorigin time two_phase messages \
spill slot truncate stream stats
ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top concurrent_stream
+TAP_TESTS = 1
+
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/two_phase.out b/contrib/test_decoding/expected/two_phase.out
new file mode 100644
index 0000000..e5e34b4
--- /dev/null
+++ b/contrib/test_decoding/expected/two_phase.out
@@ -0,0 +1,228 @@
+-- Test two-phased transactions. When two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+-- Test 1:
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing because the xact has not been prepared yet.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+PREPARE TRANSACTION 'test_prepared#1';
+-- should show both the above inserts and the PREPARE TRANSACTION.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:1
+ table public.test_prepared1: INSERT: id[integer]:2
+ PREPARE TRANSACTION 'test_prepared#1'
+(4 rows)
+
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#1'
+(1 row)
+
+-- Test 2:
+-- Test that rollback of a prepared xact is decoded.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:3
+ PREPARE TRANSACTION 'test_prepared#2'
+(3 rows)
+
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------
+ ROLLBACK PREPARED 'test_prepared#2'
+(1 row)
+
+-- Test 3:
+-- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+-----------------+----------+---------------------
+ test_prepared_1 | relation | RowExclusiveLock
+ test_prepared_1 | relation | AccessExclusiveLock
+(2 rows)
+
+-- The insert should show the newly altered column but not the DDL.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
+ PREPARE TRANSACTION 'test_prepared#3'
+(3 rows)
+
+-- Test 4:
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+----------------------------------------------------
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:5
+ COMMIT
+(3 rows)
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-----------------------------------
+ COMMIT PREPARED 'test_prepared#3'
+(1 row)
+
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:6 data[text]:null
+ COMMIT
+ BEGIN
+ table public.test_prepared2: INSERT: id[integer]:7
+ COMMIT
+(6 rows)
+
+-- Test 5:
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+ relation | locktype | mode
+----------------+----------+---------------------
+ test_prepared1 | relation | RowExclusiveLock
+ test_prepared1 | relation | ShareLock
+ test_prepared1 | relation | AccessExclusiveLock
+(3 rows)
+
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
+ table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
+ PREPARE TRANSACTION 'test_prepared_lock'
+(4 rows)
+
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+--------------------------------------
+ COMMIT PREPARED 'test_prepared_lock'
+(1 row)
+
+-- Test 6:
+-- Test savepoints and sub-xacts. Creating savepoints will create sub-xacts implicitly.
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------------------------------------------------------------
+ BEGIN
+ table public.test_prepared_savepoint: INSERT: a[integer]:1
+ PREPARE TRANSACTION 'test_prepared_savepoint'
+(3 rows)
+
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+-------------------------------------------
+ COMMIT PREPARED 'test_prepared_savepoint'
+(1 row)
+
+-- Test 7:
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+---------------------------------------------------------------------
+ BEGIN
+ table public.test_prepared1: INSERT: id[integer]:20 data[text]:null
+ COMMIT
+(3 rows)
+
+-- Test 8:
+-- cleanup and make sure results are also empty
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ data
+------
+(0 rows)
+
+SELECT pg_drop_replication_slot('regression_slot');
+ pg_drop_replication_slot
+--------------------------
+
+(1 row)
+
diff --git a/contrib/test_decoding/sql/two_phase.sql b/contrib/test_decoding/sql/two_phase.sql
new file mode 100644
index 0000000..4ed5266
--- /dev/null
+++ b/contrib/test_decoding/sql/two_phase.sql
@@ -0,0 +1,119 @@
+-- Test two-phased transactions. When two-phase-commit is enabled, transactions are
+-- decoded at PREPARE time rather than at COMMIT PREPARED time.
+SET synchronous_commit = on;
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+
+-- Test 1:
+-- Test that commands in a two phase xact are only decoded at PREPARE.
+-- Decoding after COMMIT PREPARED should only have the COMMIT PREPARED command and not the
+-- rest of the commands in the transaction.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (1);
+INSERT INTO test_prepared1 VALUES (2);
+-- should show nothing because the xact has not been prepared yet.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+PREPARE TRANSACTION 'test_prepared#1';
+-- should show both the above inserts and the PREPARE TRANSACTION.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared#1';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 2:
+-- Test that rollback of a prepared xact is decoded.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (3);
+PREPARE TRANSACTION 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+ROLLBACK PREPARED 'test_prepared#2';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 3:
+-- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
+BEGIN;
+ALTER TABLE test_prepared1 ADD COLUMN data text;
+INSERT INTO test_prepared1 VALUES (4, 'frakbar');
+PREPARE TRANSACTION 'test_prepared#3';
+-- confirm that exclusive lock from the ALTER command is held on test_prepared1 table
+SELECT 'test_prepared_1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+-- The insert should show the newly altered column but not the DDL.
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 4:
+-- Test that we decode correctly while an uncommitted prepared xact
+-- with ddl exists.
+--
+-- Use a separate table for the concurrent transaction because the lock from
+-- the ALTER will stop us inserting into the other one.
+--
+INSERT INTO test_prepared2 VALUES (5);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+COMMIT PREPARED 'test_prepared#3';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+-- make sure stuff still works
+INSERT INTO test_prepared1 VALUES (6);
+INSERT INTO test_prepared2 VALUES (7);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 5:
+-- Check `CLUSTER` (as operation that hold exclusive lock) doesn't block
+-- logical decoding.
+BEGIN;
+INSERT INTO test_prepared1 VALUES (8, 'othercol');
+CLUSTER test_prepared1 USING test_prepared1_pkey;
+INSERT INTO test_prepared1 VALUES (9, 'othercol2');
+PREPARE TRANSACTION 'test_prepared_lock';
+
+SELECT 'test_prepared1' AS relation, locktype, mode
+FROM pg_locks
+WHERE locktype = 'relation'
+ AND relation = 'test_prepared1'::regclass;
+-- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The call should return
+-- within a second.
+SET statement_timeout = '1s';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+RESET statement_timeout;
+COMMIT PREPARED 'test_prepared_lock';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 6:
+-- Test savepoints and sub-xacts. Creating savepoints will create sub-xacts implicitly.
+BEGIN;
+CREATE TABLE test_prepared_savepoint (a int);
+INSERT INTO test_prepared_savepoint VALUES (1);
+SAVEPOINT test_savepoint;
+INSERT INTO test_prepared_savepoint VALUES (2);
+ROLLBACK TO SAVEPOINT test_savepoint;
+PREPARE TRANSACTION 'test_prepared_savepoint';
+-- should show only 1, not 2
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_savepoint';
+-- consume the commit
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 7:
+-- test that a GID containing "_nodecode" gets decoded at commit prepared time
+BEGIN;
+INSERT INTO test_prepared1 VALUES (20);
+PREPARE TRANSACTION 'test_prepared_nodecode';
+-- should show nothing
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+COMMIT PREPARED 'test_prepared_nodecode';
+-- should be decoded now
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- Test 8:
+-- cleanup and make sure results are also empty
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
+-- show results. There should be nothing to show
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+
+SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/t/001_twophase.pl b/contrib/test_decoding/t/001_twophase.pl
new file mode 100644
index 0000000..1555582
--- /dev/null
+++ b/contrib/test_decoding/t/001_twophase.pl
@@ -0,0 +1,121 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Scalar::Util qw(looks_like_number);
+
+# Initialize node
+my $node_logical = get_new_node('logical');
+$node_logical->init(allows_streaming => 'logical');
+$node_logical->append_conf(
+ 'postgresql.conf', qq(
+ max_prepared_transactions = 10
+));
+$node_logical->start;
+
+# Create some pre-existing content on logical
+$node_logical->safe_psql('postgres', "CREATE TABLE tab (a int PRIMARY KEY)");
+$node_logical->safe_psql('postgres',
+ "INSERT INTO tab SELECT generate_series(1,10)");
+$node_logical->safe_psql('postgres',
+ "SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');");
+
+#Test 1:
+# This test is specifically for testing concurrent abort while logical decode
+# is ongoing. We will pass in the xid of the 2PC to the plugin as an option.
+# On the receipt of a valid "check-xid-aborted", the change API in the test decoding
+# plugin will wait for it to be aborted.
+#
+# We will fire off a ROLLBACK from another session when this decode
+# is waiting.
+#
+# The status of "check-xid-aborted" will change from in-progress to not-committed
+# (hence aborted) and we will stop decoding because the subsequent
+# system catalog scan will error out.
+
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13,14);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# get XID of the above two-phase transaction
+my $xid2pc = $node_logical->safe_psql('postgres', "SELECT transaction FROM pg_prepared_xacts WHERE gid = 'test_prepared_tab'");
+is(looks_like_number($xid2pc), qq(1), 'Got a valid two-phase XID');
+
+# start decoding the above by passing the "check-xid-aborted"
+my $logical_connstr = $node_logical->connstr . ' dbname=postgres';
+
+# decode now, it should include an ABORT entry because of the ROLLBACK below
+system_log("psql -d \"$logical_connstr\" -c \"SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'check-xid-aborted', '$xid2pc');\" \&");
+
+# check that decode starts waiting for this $xid2pc
+poll_output_until("waiting for $xid2pc to abort")
+ or die "no wait happened for the abort";
+
+# rollback the prepared transaction
+$node_logical->safe_psql('postgres', "ROLLBACK PREPARED 'test_prepared_tab';");
+
+# check for occurrence of the log about stopping this decoding
+poll_output_until("stopping decoding of test_prepared_tab")
+ or die "no decoding stop for the rollback";
+
+# consume any remaining changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# Test 2:
+# Check that commit prepared is decoded properly on immediate restart
+$node_logical->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab VALUES (11);
+ INSERT INTO tab VALUES (12);
+ ALTER TABLE tab ADD COLUMN b INT;
+ INSERT INTO tab VALUES (13, 11);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+# consume changes
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+$node_logical->stop('immediate');
+$node_logical->start;
+
+# commit post the restart
+$node_logical->safe_psql('postgres', "COMMIT PREPARED 'test_prepared_tab';");
+$node_logical->safe_psql('postgres', "SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');");
+
+# check inserts are visible
+my $result = $node_logical->safe_psql('postgres', "SELECT count(*) FROM tab where a IN (11,12) OR b IN (11);");
+is($result, qq(3), 'Rows inserted via 2PC are visible on restart');
+
+$node_logical->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot');");
+$node_logical->stop('fast');
+
+sub poll_output_until
+{
+ my ($expected) = @_;
+
+ $expected = 'xxxxxx' unless defined($expected); # default junk value
+
+ my $max_attempts = 180 * 10;
+ my $attempts = 0;
+
+ my $output_file = '';
+ while ($attempts < $max_attempts)
+ {
+ $output_file = slurp_file($node_logical->logfile());
+
+ if ($output_file =~ $expected)
+ {
+ return 1;
+ }
+
+ # Wait 0.1 second before retrying.
+ usleep(100_000);
+ $attempts++;
+ }
+
+ # The output result didn't change in 180 seconds. Give up
+ return 0;
+}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee9..fd961d4 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -68,8 +68,15 @@ static void DecodeSpecConfirm(LogicalDecodingContext *ctx, XLogRecordBuffer *buf
static void DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_commit *parsed, TransactionId xid);
+static void DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid);
static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid);
+static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed);
+
/* common function to decode tuples */
static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
@@ -239,7 +246,6 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
switch (info)
{
case XLOG_XACT_COMMIT:
- case XLOG_XACT_COMMIT_PREPARED:
{
xl_xact_commit *xlrec;
xl_xact_parsed_commit parsed;
@@ -256,8 +262,24 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeCommit(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_COMMIT_PREPARED:
+ {
+ xl_xact_commit *xlrec;
+ xl_xact_parsed_commit parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_commit *) XLogRecGetData(r);
+ ParseCommitRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeCommitPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ABORT:
- case XLOG_XACT_ABORT_PREPARED:
{
xl_xact_abort *xlrec;
xl_xact_parsed_abort parsed;
@@ -274,6 +296,23 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
DecodeAbort(ctx, buf, &parsed, xid);
break;
}
+ case XLOG_XACT_ABORT_PREPARED:
+ {
+ xl_xact_abort *xlrec;
+ xl_xact_parsed_abort parsed;
+ TransactionId xid;
+
+ xlrec = (xl_xact_abort *) XLogRecGetData(r);
+ ParseAbortRecord(XLogRecGetInfo(buf->record), xlrec, &parsed);
+
+ if (!TransactionIdIsValid(parsed.twophase_xid))
+ xid = XLogRecGetXid(r);
+ else
+ xid = parsed.twophase_xid;
+
+ DecodeAbortPrepared(ctx, buf, &parsed, xid);
+ break;
+ }
case XLOG_XACT_ASSIGNMENT:
/*
@@ -312,17 +351,35 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
}
break;
case XLOG_XACT_PREPARE:
+ {
+ xl_xact_parsed_prepare parsed;
+ xl_xact_prepare *xlrec;
- /*
- * Currently decoding ignores PREPARE TRANSACTION and will just
- * decode the transaction when the COMMIT PREPARED is sent or
- * throw away the transaction's contents when a ROLLBACK PREPARED
- * is received. In the future we could add code to expose prepared
- * transactions in the changestream allowing for a kind of
- * distributed 2PC.
- */
- ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
- break;
+ /* check that output plugin is capable of two-phase decoding */
+ if (!ctx->twophase)
+ {
+ ReorderBufferProcessXid(reorder, XLogRecGetXid(r), buf->origptr);
+ break;
+ }
+
+ /* ok, parse it */
+ xlrec = (xl_xact_prepare *)XLogRecGetData(r);
+ ParsePrepareRecord(XLogRecGetInfo(buf->record),
+ xlrec, &parsed);
+
+ /* does output plugin want this particular transaction? */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(reorder, parsed.twophase_xid,
+ parsed.twophase_gid))
+ {
+ ReorderBufferProcessXid(reorder, parsed.twophase_xid,
+ buf->origptr);
+ break;
+ }
+
+ DecodePrepare(ctx, buf, &parsed);
+ break;
+ }
default:
elog(ERROR, "unexpected RM_XACT_ID record type: %u", info);
}
@@ -659,6 +716,131 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Consolidated commit record handling between the different form of commit
+ * records.
+ */
+static void
+DecodeCommitPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_commit *parsed, TransactionId xid)
+{
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = parsed->xact_time;
+ RepOriginId origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
+ parsed->nsubxacts, parsed->subxacts);
+
+ /* ----
+ * Check whether we are interested in this specific transaction, and tell
+ * the reorderbuffer to forget the content of the (sub-)transactions
+ * if not.
+ *
+ * There can be several reasons we might not be interested in this
+ * transaction:
+ * 1) We might not be interested in decoding transactions up to this
+ * LSN. This can happen because we previously decoded it and now just
+ * are restarting or if we haven't assembled a consistent snapshot yet.
+ * 2) The transaction happened in another database.
+ * 3) The output plugin is not interested in the origin.
+ * 4) We are doing fast-forwarding
+ *
+ * We can't just use ReorderBufferAbort() here, because we need to execute
+ * the transaction's invalidations. This currently won't be needed if
+ * we're just skipping over the transaction because currently we only do
+ * so during startup, to get to the first transaction the client needs. As
+ * we have reset the catalog caches before starting to read WAL, and we
+ * haven't yet touched any catalogs, there can't be anything to invalidate.
+ * But if we're "forgetting" this commit because it's it happened in
+ * another database, the invalidations might be important, because they
+ * could be for shared catalogs and we might have loaded data into the
+ * relevant syscaches.
+ * ---
+ */
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ {
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferForget(ctx->reorder, parsed->subxacts[i], buf->origptr);
+ }
+ ReorderBufferForget(ctx->reorder, xid, buf->origptr);
+
+ return;
+ }
+
+ /* tell the reorderbuffer about the surviving subtransactions */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /*
+ * For COMMIT PREPARED, the changes have already been replayed at
+ * PREPARE time, so we only need to notify the subscriber that the GID
+ * finally committed.
+ * If filter check present and this needs to be skipped, do a regular commit.
+ */
+ if (ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn);
+ }
+ else
+ {
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, true);
+ }
+
+}
+
+/*
+ * Decode PREPARE record. Similar logic as in COMMIT
+ */
+static void
+DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_prepare * parsed)
+{
+ XLogRecPtr origin_lsn = parsed->origin_lsn;
+ TimestampTz commit_time = parsed->origin_timestamp;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+ int i;
+ TransactionId xid = parsed->twophase_xid;
+
+ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) ||
+ ctx->fast_forward || FilterByOrigin(ctx, origin_id))
+ return;
+
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferCommitChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr, buf->endptr);
+ }
+
+ /* replay actions of all transaction + subtransactions in order */
+ ReorderBufferPrepare(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn, parsed->twophase_gid);
+}
+
+/*
* Get the data from the various forms of abort records and pass it on to
* snapbuild.c and reorderbuffer.c
*/
@@ -681,6 +863,50 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
}
/*
+ * Get the data from the various forms of abort records and pass it on to
+ * snapbuild.c and reorderbuffer.c
+ */
+static void
+DecodeAbortPrepared(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
+ xl_xact_parsed_abort *parsed, TransactionId xid)
+{
+ int i;
+ XLogRecPtr origin_lsn = InvalidXLogRecPtr;
+ TimestampTz commit_time = 0;
+ XLogRecPtr origin_id = XLogRecGetOrigin(buf->record);
+
+ if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
+ {
+ origin_lsn = parsed->origin_lsn;
+ commit_time = parsed->origin_timestamp;
+ }
+
+ /*
+ * If it passes through the filters handle the ROLLBACK via callbacks
+ */
+ if(!FilterByOrigin(ctx, origin_id) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ !ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed->twophase_gid))
+ {
+ Assert(TransactionIdIsValid(xid));
+ Assert(parsed->dbId == ctx->slot->data.database);
+
+ ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+ commit_time, origin_id, origin_lsn,
+ parsed->twophase_gid, false);
+ return;
+ }
+
+ for (i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferAbort(ctx->reorder, parsed->subxacts[i],
+ buf->record->EndRecPtr);
+ }
+
+ ReorderBufferAbort(ctx->reorder, xid, buf->record->EndRecPtr);
+}
+
+/*
* Parse XLOG_HEAP_INSERT (not MULTI_INSERT!) records into tuplebufs.
*
* Deletes can contain the new tuple.
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7a8bf76..c5620e1 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -251,7 +251,8 @@ static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn
static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
char *change);
static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
+static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ bool txn_prepared);
static void ReorderBufferCleanupSerializedTXNs(const char *slotname);
static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
TransactionId xid, XLogSegNo segno);
@@ -418,6 +419,12 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
/* free data that's contained */
+ if (txn->gid != NULL)
+ {
+ pfree(txn->gid);
+ txn->gid = NULL;
+ }
+
if (txn->tuplecid_hash != NULL)
{
hash_destroy(txn->tuplecid_hash);
@@ -1511,12 +1518,14 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
}
/*
- * Discard changes from a transaction (and subtransactions), after streaming
- * them. Keep the remaining info - transactions, tuplecids, invalidations and
- * snapshots.
+ * Discard changes from a transaction (and subtransactions), either after streaming or
+ * after a PREPARE.
+ * The flag txn_prepared indicates if this is called after a PREPARE.
+ * If streaming, keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots. If after a PREPARE, keep only the invalidations and snapshots.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prepared)
{
dlist_mutable_iter iter;
@@ -1535,7 +1544,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
Assert(rbtxn_is_known_subxact(subtxn));
Assert(subtxn->nsubtxns == 0);
- ReorderBufferTruncateTXN(rb, subtxn);
+ ReorderBufferTruncateTXN(rb, subtxn, txn_prepared);
}
/* cleanup changes in the txn */
@@ -1569,9 +1578,33 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
- if ((!txn->toptxn) || (txn->nentries_mem != 0))
+ if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
+ if (txn_prepared)
+ {
+ /*
+ * If this is a prepared txn, cleanup the tuplecids we stored for decoding
+ * catalog snapshot access.
+ * They are always stored in the toplevel transaction.
+ */
+ dlist_foreach_modify(iter, &txn->tuplecids)
+ {
+ ReorderBufferChange *change;
+
+ change = dlist_container(ReorderBufferChange, node, iter.cur);
+
+ /* Check we're not mixing changes from different transactions. */
+ Assert(change->txn == txn);
+ Assert(change->action == REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID);
+
+ /* Remove the change from its containing list. */
+ dlist_delete(&change->node);
+
+ ReorderBufferReturnChange(rb, change, true);
+ }
+ }
+
/*
* Destroy the (relfilenode, ctid) hashtable, so that we don't leak any
* memory. We could also keep the hash table and update it with new ctid
@@ -1765,9 +1798,22 @@ ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn)
ReorderBufferStreamTXN(rb, txn);
- rb->stream_commit(rb, txn, txn->final_lsn);
-
- ReorderBufferCleanupTXN(rb, txn);
+ if (rbtxn_prepared(txn))
+ {
+ rb->stream_prepare(rb, txn, txn->final_lsn);
+ /* This is a PREPARED transaction, part of a two-phase commit.
+ * The full cleanup will happen as part of the COMMIT PREPAREDs, so now
+ * just truncate txn by removing changes and tuple_cids
+ */
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
+ else
+ {
+ rb->stream_commit(rb, txn, txn->final_lsn);
+ ReorderBufferCleanupTXN(rb, txn);
+ }
}
/*
@@ -1895,8 +1941,12 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
XLogRecPtr last_lsn,
ReorderBufferChange *specinsert)
{
- /* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn);
+ /* Discard the changes that we just streamed.
+ * This can only be called if streaming and not part of a PREPARE in
+ * a two-phase commit, so set prepared flag as false.
+ */
+ Assert(!rbtxn_prepared(txn));
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Free all resources allocated for toast reconstruction */
ReorderBufferToastReset(rb, txn);
@@ -1918,6 +1968,11 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* Helper function for ReorderBufferCommit and ReorderBufferStreamTXN.
*
+ * We are here due to one of the 3 scenarios:
+ * 1. As part of streaming an in-progress transactions
+ * 2. Prepare of a two-phase commit
+ * 3. Commit of a transaction.
+ *
* Send data of a transaction (and its subtransactions) to the
* output plugin. We iterate over the top and subtransactions (using a k-way
* merge) and replay the changes in lsn order.
@@ -2003,7 +2058,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
prev_lsn = change->lsn;
/* Set the current xid to detect concurrent aborts. */
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2294,7 +2349,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
}
}
else
- rb->commit(rb, txn, commit_lsn);
+ {
+ /*
+ * Call either PREPARE (for two-phase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_prepared(txn))
+ rb->prepare(rb, txn, commit_lsn);
+ else
+ rb->commit(rb, txn, commit_lsn);
+ }
/* this is just a sanity check against bad output plugin behaviour */
if (GetCurrentTransactionIdIfAny() != InvalidTransactionId)
@@ -2328,18 +2392,32 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
RollbackAndReleaseCurrentSubTransaction();
/*
+ * We are here due to one of the 3 scenarios:
+ * 1. As part of streaming in-progress transactions
+ * 2. Prepare of a two-phase commit
+ * 3. Commit of a transaction.
+ *
* If we are streaming the in-progress transaction then discard the
* changes that we just streamed, and mark the transactions as
- * streamed (if they contained changes). Otherwise, remove all the
+ * streamed (if they contained changes), set prepared flag as false.
+ * If part of a prepare of a two-phase commit set the prepared flag
+ * as true so that we can discard changes and cleanup tuplecids.
+ * Otherwise, remove all the
* changes and deallocate the ReorderBufferTXN.
*/
if (streaming)
{
- ReorderBufferTruncateTXN(rb, txn);
+ ReorderBufferTruncateTXN(rb, txn, false);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
+ else if (rbtxn_prepared(txn))
+ {
+ ReorderBufferTruncateTXN(rb, txn, true);
+ /* Reset the CheckXidAlive */
+ CheckXidAlive = InvalidTransactionId;
+ }
else
ReorderBufferCleanupTXN(rb, txn);
}
@@ -2369,17 +2447,20 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
/*
* The error code ERRCODE_TRANSACTION_ROLLBACK indicates a concurrent
- * abort of the (sub)transaction we are streaming. We need to do the
+ * abort of the (sub)transaction we are streaming or preparing. We need to do the
* cleanup and return gracefully on this error, see SetupCheckXidLive.
*/
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
- * This error can only occur when we are sending the data in
- * streaming mode and the streaming is not finished yet.
+ * This error can only occur either when we are sending the data in
+ * streaming mode and the streaming is not finished yet or when we are
+ * sending the data out on a PREPARE during a two-phase commit.
+ * Both conditions can't be true either, it should be one of them.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
+ Assert(!(streaming && rbtxn_prepared(txn)));
/* Cleanup the temporary error state. */
FlushErrorState();
@@ -2387,10 +2468,21 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /*
+ * If streaming, reset the TXN so that it is allowed to stream remaining data.
+ */
+ if (streaming)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ ReorderBufferTruncateTXN(rb, txn, true);
+ }
}
else
{
@@ -2412,23 +2504,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
* This interface is called once a toplevel commit is read for both streamed
* as well as non-streamed transactions.
*/
-void
-ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
- XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
- TimestampTz commit_time,
- RepOriginId origin_id, XLogRecPtr origin_lsn)
+static void
+ReorderBufferCommitInternal(ReorderBufferTXN *txn,
+ ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
{
- ReorderBufferTXN *txn;
Snapshot snapshot_now;
CommandId command_id = FirstCommandId;
- txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
- false);
-
- /* unknown transaction, nothing to replay */
- if (txn == NULL)
- return;
-
txn->final_lsn = commit_lsn;
txn->end_lsn = end_lsn;
txn->commit_time = commit_time;
@@ -2470,6 +2555,141 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
}
/*
+ * Ask output plugin whether we want to skip this PREPARE and send
+ * this transaction as a regular commit later.
+ */
+bool
+ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid, const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+ return rb->filter_prepare(rb, txn, xid, gid);
+}
+
+
+/*
+ * Commit a transaction.
+ *
+ * See comments for ReorderBufferCommitInternal()
+ */
+void
+ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Prepare a two-phase transaction. It calls ReorderBufferCommitInternal()
+ * since all prepared transactions need to be decoded at PREPARE time.
+ */
+void
+ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /* unknown transaction, nothing to replay */
+ if (txn == NULL)
+ return;
+
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ ReorderBufferCommitInternal(txn, rb, xid, commit_lsn, end_lsn,
+ commit_time, origin_id, origin_lsn);
+}
+
+/*
+ * Check whether this transaction was sent as prepared to subscribers.
+ * Called while handling commit|abort prepared.
+ */
+bool
+ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid)
+{
+ ReorderBufferTXN *txn;
+
+ txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr,
+ false);
+
+ /*
+ * Always call the prepare filter. It's the job of the prepare filter to
+ * give us the *same* response for a given xid across multiple calls
+ * (including ones on restart)
+ */
+ return !(rb->filter_prepare(rb, txn, xid, gid));
+}
+
+/*
+ * Send standalone xact event. This is used to handle COMMIT/ROLLBACK PREPARED.
+ */
+void
+ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit)
+{
+ ReorderBufferTXN *txn;
+
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyway, two-phase transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, commit_lsn,
+ true);
+
+ txn->final_lsn = commit_lsn;
+ txn->end_lsn = end_lsn;
+ txn->commit_time = commit_time;
+ txn->origin_id = origin_id;
+ txn->origin_lsn = origin_lsn;
+ /* this txn is obviously prepared */
+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);
+
+ if (is_commit)
+ txn->txn_flags |= RBTXN_COMMIT_PREPARED;
+ else
+ txn->txn_flags |= RBTXN_ROLLBACK_PREPARED;
+
+ if (rbtxn_commit_prepared(txn))
+ rb->commit_prepared(rb, txn, commit_lsn);
+ else if (rbtxn_rollback_prepared(txn))
+ rb->rollback_prepared(rb, txn, commit_lsn);
+
+
+ /* cleanup: make sure there's no cache pollution */
+ ReorderBufferExecuteInvalidations(txn->ninvalidations,
+ txn->invalidations);
+ ReorderBufferCleanupTXN(rb, txn);
+}
+
+/*
* Abort a transaction that possibly has previous changes. Needs to be first
* called for subtransactions and then for the toplevel xid.
*
@@ -2512,7 +2732,12 @@ ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
/* cosmetic... */
txn->final_lsn = lsn;
- /* remove potential on-disk data, and deallocate */
+ /*
+ * remove potential on-disk data, and deallocate.
+ *
+ * We remove it even for prepared transactions (GID is enough to
+ * commit/abort those later).
+ */
ReorderBufferCleanupTXN(rb, txn);
}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 1d9bfb0..28c7337 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -630,6 +630,11 @@ void ReorderBufferQueueMessage(ReorderBuffer *, TransactionId, Snapshot snapsho
void ReorderBufferCommit(ReorderBuffer *, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
+void ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid, bool is_commit);
void ReorderBufferAssignChild(ReorderBuffer *, TransactionId, TransactionId, XLogRecPtr commit_lsn);
void ReorderBufferCommitChild(ReorderBuffer *, TransactionId, TransactionId,
XLogRecPtr commit_lsn, XLogRecPtr end_lsn);
@@ -653,6 +658,15 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
+bool ReorderBufferPrepareNeedSkip(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+bool ReorderBufferTxnIsPrepared(ReorderBuffer *rb, TransactionId xid,
+ const char *gid);
+void ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+ TimestampTz commit_time,
+ RepOriginId origin_id, XLogRecPtr origin_lsn,
+ char *gid);
ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
--
1.8.3.1
v13-0003-Support-2PC-txn-pgoutput.patchapplication/octet-stream; name=v13-0003-Support-2PC-txn-pgoutput.patchDownload
From 4f8354c70dc8b685d26c214fce239870d0695b4d Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 27 Oct 2020 05:35:28 -0400
Subject: [PATCH v13] Support 2PC txn - pgoutput.
This patch adds support in the pgoutput plugin and subscriber for handling
of two-phase commits.
Includes pgoutput changes.
Includes subscriber changes.
Includes two-phase commit test code (streaming and not streaming).
---
src/backend/access/transam/twophase.c | 27 ++
src/backend/replication/logical/proto.c | 148 +++++-
src/backend/replication/logical/worker.c | 286 +++++++++++-
src/backend/replication/pgoutput/pgoutput.c | 75 +++-
src/include/access/twophase.h | 1 +
src/include/replication/logicalproto.h | 33 ++
src/test/subscription/t/020_twophase.pl | 345 ++++++++++++++
src/test/subscription/t/021_twophase_streaming.pl | 521 ++++++++++++++++++++++
8 files changed, 1414 insertions(+), 22 deletions(-)
create mode 100644 src/test/subscription/t/020_twophase.pl
create mode 100644 src/test/subscription/t/021_twophase_streaming.pl
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 7940060..129afe9 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}
/*
+ * LookupGXact
+ * Check if the prepared transaction with the given GID is around
+ */
+bool
+LookupGXact(const char *gid)
+{
+ int i;
+ bool found = false;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+ for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs */
+ if (gxact->valid && strcmp(gxact->gid, gid) == 0)
+ {
+ found = true;
+ break;
+ }
+
+ }
+ LWLockRelease(TwoPhaseStateLock);
+ return found;
+}
+
+/*
* LockGXact
* Locate the prepared transaction and mark it busy for COMMIT or PREPARE.
*/
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142..ecd734b 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -78,7 +78,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, 'C'); /* sending COMMIT */
- /* send the flags field (unused for now) */
+ /* send the flags field */
pq_sendbyte(out, flags);
/* send fields */
@@ -99,6 +99,7 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
if (flags != 0)
elog(ERROR, "unrecognized flags %u in commit message", flags);
+
/* read fields */
commit_data->commit_lsn = pq_getmsgint64(in);
commit_data->end_lsn = pq_getmsgint64(in);
@@ -106,6 +107,151 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
}
/*
+ * Write PREPARE to the output stream.
+ */
+void
+logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'P'); /* sending PREPARE protocol */
+
+ /*
+ * This should only ever happen for two-phase commit transactions. In which case we
+ * expect to have a valid GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(txn->gid != NULL);
+
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags = LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags = LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags = LOGICALREP_IS_PREPARE;
+
+ /* Make sure exactly one of the expected flags is set. */
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read transaction PREPARE from the stream.
+ */
+void
+logicalrep_read_prepare(StringInfo in, LogicalRepPrepareData * prepare_data)
+{
+ /* read flags */
+ uint8 flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+}
+
+/*
+ * Write STREAM PREPARE to the output stream.
+ * (For stream PREPARE, stream COMMIT PREPARED, stream ROLLBACK PREPARED)
+ */
+void
+logicalrep_write_stream_prepare(StringInfo out,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ uint8 flags = 0;
+
+ pq_sendbyte(out, 'p'); /* sending STREAM PREPARE protocol */
+
+ /*
+ * This should only ever happen for two-phase transactions. In which case we
+ * expect to have a valid GID.
+ */
+ Assert(rbtxn_prepared(txn));
+ Assert(txn->gid != NULL);
+
+ /*
+ * For streaming APIs only PREPARE is supported. [COMMIT|ROLLBACK] PREPARED
+ * uses non-streaming APIs
+ */
+ flags = LOGICALREP_IS_PREPARE;
+
+ /* transaction ID */
+ Assert(TransactionIdIsValid(txn->xid));
+ pq_sendint32(out, txn->xid);
+
+ /* send the flags field */
+ pq_sendbyte(out, flags);
+
+ /* send fields */
+ pq_sendint64(out, prepare_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);
+
+ /* send gid */
+ pq_sendstring(out, txn->gid);
+}
+
+/*
+ * Read STREAM PREPARE from the output stream.
+ * (For stream PREPARE, stream COMMIT PREPARED, stream ROLLBACK PREPARED)
+ */
+TransactionId
+logicalrep_read_stream_prepare(StringInfo in, LogicalRepPrepareData *prepare_data)
+{
+ TransactionId xid;
+ uint8 flags;
+
+ xid = pq_getmsgint(in, 4);
+
+ /* read flags */
+ flags = pq_getmsgbyte(in);
+
+ if (!PrepareFlagsAreValid(flags))
+ elog(ERROR, "unrecognized flags %u in prepare message", flags);
+
+ /* set the action (reuse the constants used for the flags) */
+ prepare_data->prepare_type = flags;
+
+ /* read fields */
+ prepare_data->prepare_lsn = pq_getmsgint64(in);
+ prepare_data->end_lsn = pq_getmsgint64(in);
+ prepare_data->preparetime = pq_getmsgint64(in);
+
+ /* read gid (copy it into a pre-allocated buffer) */
+ strcpy(prepare_data->gid, pq_getmsgstring(in));
+
+ return xid;
+}
+
+/*
* Write ORIGIN to the output stream.
*/
void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 3a5b733..2f6514e 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -246,6 +246,8 @@ static void apply_handle_tuple_routing(ResultRelInfo *relinfo,
LogicalRepRelMapEntry *relmapentry,
CmdType operation);
+static int apply_spooled_messages(TransactionId xid, XLogRecPtr lsn);
+
/*
* Should this worker apply changes for given relation.
*
@@ -722,6 +724,7 @@ apply_handle_commit(StringInfo s)
replorigin_session_origin_timestamp = commit_data.committime;
CommitTransactionCommand();
+
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
@@ -742,6 +745,225 @@ apply_handle_commit(StringInfo s)
}
/*
+ * Called from apply_handle_prepare to handle a PREPARE TRANSACTION.
+ */
+static void
+apply_handle_prepare_txn(LogicalRepPrepareData * prepare_data)
+{
+ Assert(prepare_data->prepare_lsn == remote_final_lsn);
+
+ /* The synchronization worker runs in single transaction. */
+ if (IsTransactionState() && !am_tablesync_worker())
+ {
+ /* End the earlier transaction and start a new one */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct
+ * position in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ PrepareTransactionBlock(prepare_data->gid);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a COMMIT PREPARED of a previously
+ * PREPARED transaction.
+ */
+static void
+apply_handle_commit_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /* there is no transaction when COMMIT PREPARED is called */
+ ensure_transaction();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ FinishPreparedTransaction(prepare_data->gid, true);
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Called from apply_handle_prepare to handle a ROLLBACK PREPARED of a previously
+ * PREPARED TRANSACTION.
+ */
+static void
+apply_handle_rollback_prepared_txn(LogicalRepPrepareData * prepare_data)
+{
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data->end_lsn;
+ replorigin_session_origin_timestamp = prepare_data->preparetime;
+
+ /*
+ * During logical decoding, on the apply side, it's possible that a
+ * prepared transaction got aborted while decoding. In that case, we stop
+ * the decoding and abort the transaction immediately. However the
+ * ROLLBACK prepared processing still reaches the subscriber. In that case
+ * it's ok to have a missing gid
+ */
+ if (LookupGXact(prepare_data->gid))
+ {
+ /* there is no transaction when ABORT/ROLLBACK PREPARED is called */
+ ensure_transaction();
+ FinishPreparedTransaction(prepare_data->gid, false);
+ CommitTransactionCommand();
+ }
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data->end_lsn);
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data->end_lsn);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
+ * Handle PREPARE message.
+ */
+static void
+apply_handle_prepare(StringInfo s)
+{
+ LogicalRepPrepareData prepare_data;
+
+ logicalrep_read_prepare(s, &prepare_data);
+
+ switch (prepare_data.prepare_type)
+ {
+ case LOGICALREP_IS_PREPARE:
+ apply_handle_prepare_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_COMMIT_PREPARED:
+ apply_handle_commit_prepared_txn(&prepare_data);
+ break;
+
+ case LOGICALREP_IS_ROLLBACK_PREPARED:
+ apply_handle_rollback_prepared_txn(&prepare_data);
+ break;
+
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected type of prepare message: %d",
+ prepare_data.prepare_type)));
+ }
+}
+
+/*
+ * Handle STREAM PREPARE.
+ *
+ * Logic is in two parts:
+ * 1. Replay all the spooled operations
+ * 2. Mark the transaction as prepared
+ */
+static void
+apply_handle_stream_prepare(StringInfo s)
+{
+ int nchanges = 0;
+ LogicalRepPrepareData prepare_data;
+ TransactionId xid;
+
+ Assert(!in_streamed_transaction);
+
+ xid = logicalrep_read_stream_prepare(s, &prepare_data);
+ elog(DEBUG1, "received prepare for streamed transaction %u", xid);
+
+ /*
+ * This should be a PREPARE only. The COMMIT PREPARED and ROLLBACK PREPARED
+ * for streaming are handled by the non-streaming APIs.
+ */
+ Assert(prepare_data.prepare_type == LOGICALREP_IS_PREPARE);
+
+ /*
+ * ========================================
+ * 1. Replay all the spooled operations
+ * - This code is same as what apply_handle_stream_commit does for NON two-phase stream commit
+ * ========================================
+ */
+
+ ensure_transaction();
+
+ nchanges = apply_spooled_messages(xid, prepare_data.prepare_lsn);
+
+ /*
+ * ========================================
+ * 2. Mark the transaction as prepared.
+ * - This code is same as what apply_handle_prepare_txn does for two-phase prepare of the non-streamed tx
+ * ========================================
+ */
+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data.end_lsn;
+ replorigin_session_origin_timestamp = prepare_data.preparetime;
+
+ PrepareTransactionBlock(prepare_data.gid);
+ CommitTransactionCommand();
+
+ pgstat_report_stat(false);
+
+ store_flush_position(prepare_data.end_lsn);
+
+ elog(DEBUG1, "apply_handle_stream_prepare_txn: replayed %d (all) changes.", nchanges);
+
+ in_remote_transaction = false;
+
+ /* Process any tables that are being synchronized in parallel. */
+ process_syncing_tables(prepare_data.end_lsn);
+
+ /* unlink the files with serialized changes and subxact info */
+ stream_cleanup_files(MyLogicalRepWorker->subid, xid);
+
+ pgstat_report_activity(STATE_IDLE, NULL);
+}
+
+/*
* Handle ORIGIN message.
*
* TODO, support tracking of multiple origins
@@ -935,30 +1157,21 @@ apply_handle_stream_abort(StringInfo s)
}
/*
- * Handle STREAM COMMIT message.
+ * Common spoolfile processing.
+ * Returns how many changes were applied.
*/
-static void
-apply_handle_stream_commit(StringInfo s)
+static int
+apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
{
- TransactionId xid;
StringInfoData s2;
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
bool found;
- LogicalRepCommitData commit_data;
StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
- Assert(!in_streamed_transaction);
-
- xid = logicalrep_read_stream_commit(s, &commit_data);
-
- elog(DEBUG1, "received commit for streamed transaction %u", xid);
-
- ensure_transaction();
-
/*
* Allocate file handle and memory required to process all the messages in
* TopTransactionContext to avoid them getting reset after each message is
@@ -981,7 +1194,7 @@ apply_handle_stream_commit(StringInfo s)
MemoryContextSwitchTo(oldcxt);
- remote_final_lsn = commit_data.commit_lsn;
+ remote_final_lsn = lsn;
/*
* Make sure the handle apply_dispatch methods are aware we're in a remote
@@ -1050,6 +1263,35 @@ apply_handle_stream_commit(StringInfo s)
BufFileClose(fd);
+ pfree(buffer);
+ pfree(s2.data);
+
+ elog(DEBUG1, "replayed %d (all) changes from file \"%s\"",
+ nchanges, path);
+
+ return nchanges;
+}
+
+/*
+ * Handle STREAM COMMIT message.
+ */
+static void
+apply_handle_stream_commit(StringInfo s)
+{
+ TransactionId xid;
+ LogicalRepCommitData commit_data;
+ int nchanges = 0;
+
+ Assert(!in_streamed_transaction);
+
+ xid = logicalrep_read_stream_commit(s, &commit_data);
+
+ elog(DEBUG1, "received commit for streamed transaction %u", xid);
+
+ ensure_transaction();
+
+ nchanges = apply_spooled_messages(xid, commit_data.commit_lsn);
+
/*
* Update origin state so we can restart streaming from correct position
* in case of crash.
@@ -1057,16 +1299,12 @@ apply_handle_stream_commit(StringInfo s)
replorigin_session_origin_lsn = commit_data.end_lsn;
replorigin_session_origin_timestamp = commit_data.committime;
- pfree(buffer);
- pfree(s2.data);
-
CommitTransactionCommand();
pgstat_report_stat(false);
store_flush_position(commit_data.end_lsn);
- elog(DEBUG1, "replayed %d (all) changes from file \"%s\"",
- nchanges, path);
+ elog(DEBUG1, "apply_handle_stream_commit: replayed %d (all) changes.", nchanges);
in_remote_transaction = false;
@@ -1905,10 +2143,14 @@ apply_dispatch(StringInfo s)
case 'B':
apply_handle_begin(s);
break;
- /* COMMIT */
+ /* COMMIT/ABORT */
case 'C':
apply_handle_commit(s);
break;
+ /* PREPARE and [COMMIT|ROLLBACK] PREPARED */
+ case 'P':
+ apply_handle_prepare(s);
+ break;
/* INSERT */
case 'I':
apply_handle_insert(s);
@@ -1953,6 +2195,10 @@ apply_dispatch(StringInfo s)
case 'c':
apply_handle_stream_commit(s);
break;
+ /* STREAM PREPARE */
+ case 'p':
+ apply_handle_stream_prepare(s);
+ break;
default:
ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 9c997ae..b4f2c9a 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -47,6 +47,12 @@ static void pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferChange *change);
static bool pgoutput_origin_filter(LogicalDecodingContext *ctx,
RepOriginId origin_id);
+static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
+static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_stream_start(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn);
static void pgoutput_stream_stop(struct LogicalDecodingContext *ctx,
@@ -57,7 +63,8 @@ static void pgoutput_stream_abort(struct LogicalDecodingContext *ctx,
static void pgoutput_stream_commit(struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
-
+static void pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static bool publications_valid;
static bool in_streaming;
@@ -143,6 +150,10 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->change_cb = pgoutput_change;
cb->truncate_cb = pgoutput_truncate;
cb->commit_cb = pgoutput_commit_txn;
+
+ cb->prepare_cb = pgoutput_prepare_txn;
+ cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
+ cb->rollback_prepared_cb = pgoutput_rollback_prepared_txn;
cb->filter_by_origin_cb = pgoutput_origin_filter;
cb->shutdown_cb = pgoutput_shutdown;
@@ -153,6 +164,8 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
cb->stream_commit_cb = pgoutput_stream_commit;
cb->stream_change_cb = pgoutput_change;
cb->stream_truncate_cb = pgoutput_truncate;
+ /* transaction streaming - two-phase commit */
+ cb->stream_prepare_cb = pgoutput_stream_prepare_txn;
}
static void
@@ -378,6 +391,48 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
}
/*
+ * PREPARE callback
+ */
+static void
+pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * COMMIT PREPARED callback
+ */
+static void
+pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
+ * ROLLBACK PREPARED callback
+ */
+static void
+pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ OutputPluginUpdateProgress(ctx);
+
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
* Write the current schema of the relation and its ancestor (if any) if not
* done yet.
*/
@@ -857,6 +912,24 @@ pgoutput_stream_commit(struct LogicalDecodingContext *ctx,
}
/*
+ * PREPARE callback (for streaming two-phase commit).
+ *
+ * Notify the downstream to prepare the transaction.
+ */
+static void
+pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn)
+{
+ Assert(rbtxn_is_streamed(txn));
+
+ OutputPluginUpdateProgress(ctx);
+ OutputPluginPrepareWrite(ctx, true);
+ logicalrep_write_stream_prepare(ctx->out, txn, prepare_lsn);
+ OutputPluginWrite(ctx, true);
+}
+
+/*
* Initialize the relation schema sync cache for a decoding session.
*
* The hash table is destroyed at the end of a decoding session. While
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 2ca71c3..b2628ea 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -44,6 +44,7 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid,
extern void StartPrepare(GlobalTransaction gxact);
extern void EndPrepare(GlobalTransaction gxact);
extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
+extern bool LookupGXact(const char *gid);
extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
int *nxids_p);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0c2cda2..ee38f89 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -87,6 +87,7 @@ typedef struct LogicalRepBeginData
TransactionId xid;
} LogicalRepBeginData;
+/* Commit (and abort) information */
typedef struct LogicalRepCommitData
{
XLogRecPtr commit_lsn;
@@ -94,6 +95,28 @@ typedef struct LogicalRepCommitData
TimestampTz committime;
} LogicalRepCommitData;
+
+/* Prepare protocol information */
+typedef struct LogicalRepPrepareData
+{
+ uint8 prepare_type;
+ XLogRecPtr prepare_lsn;
+ XLogRecPtr end_lsn;
+ TimestampTz preparetime;
+ char gid[GIDSIZE];
+} LogicalRepPrepareData;
+
+/* types of the prepare protocol message */
+#define LOGICALREP_IS_PREPARE 0x01
+#define LOGICALREP_IS_COMMIT_PREPARED 0x02
+#define LOGICALREP_IS_ROLLBACK_PREPARED 0x04
+
+/* prepare can be exactly one of PREPARE, [COMMIT|ROLLBACK] PREPARED*/
+#define PrepareFlagsAreValid(flags) \
+ (((flags) == LOGICALREP_IS_PREPARE) || \
+ ((flags) == LOGICALREP_IS_COMMIT_PREPARED) || \
+ ((flags) == LOGICALREP_IS_ROLLBACK_PREPARED))
+
extern void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn);
extern void logicalrep_read_begin(StringInfo in,
LogicalRepBeginData *begin_data);
@@ -101,6 +124,10 @@ extern void logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn);
extern void logicalrep_read_commit(StringInfo in,
LogicalRepCommitData *commit_data);
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+ LogicalRepPrepareData * prepare_data);
extern void logicalrep_write_origin(StringInfo out, const char *origin,
XLogRecPtr origin_lsn);
extern char *logicalrep_read_origin(StringInfo in, XLogRecPtr *origin_lsn);
@@ -144,4 +171,10 @@ extern void logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
extern void logicalrep_read_stream_abort(StringInfo in, TransactionId *xid,
TransactionId *subxid);
+extern void logicalrep_write_stream_prepare(StringInfo out, ReorderBufferTXN *txn,
+ XLogRecPtr prepare_lsn);
+extern TransactionId logicalrep_read_stream_prepare(StringInfo in,
+ LogicalRepPrepareData *prepare_data);
+
+
#endif /* LOGICAL_PROTO_H */
diff --git a/src/test/subscription/t/020_twophase.pl b/src/test/subscription/t/020_twophase.pl
new file mode 100644
index 0000000..f489f47
--- /dev/null
+++ b/src/test/subscription/t/020_twophase.pl
@@ -0,0 +1,345 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 21;
+
+###############################
+# Setup
+###############################
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf',
+ qq(max_prepared_transactions = 10));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf',
+ qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full SELECT generate_series(1,10)");
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab_full2 (x text)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_full2 VALUES ('a'), ('b'), ('b')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_full (a int PRIMARY KEY)");
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_full2 (x text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+ "ALTER PUBLICATION tap_pub ADD TABLE tab_full, tab_full2");
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres', "
+ CREATE SUBSCRIPTION tap_sub
+ CONNECTION '$publisher_connstr application_name=$appname'
+ PUBLICATION tap_pub");
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+ "SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+ "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+###############################
+# check that 2PC gets replicated to subscriber
+# then COMMIT PREPARED
+###############################
+
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (11);
+ PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+my $result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets committed on subscriber
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a = 11;");
+is($result, qq(1), 'Row inserted via 2PC has committed on subscriber');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+###############################
+# check that 2PC gets replicated to subscriber
+# then ROLLBACK PREPARED
+###############################
+
+$node_publisher->safe_psql('postgres',"
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ PREPARE TRANSACTION 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab_full';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a = 12;");
+is($result, qq(0), 'Row inserted via 2PC is not present on subscriber');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab_full';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+###############################
+# Check that ROLLBACK PREPARED is decoded properly on crash restart
+# (publisher and subscriber crash)
+###############################
+
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+
+$node_publisher->start;
+$node_subscriber->start;
+
+# rollback post the restart
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are rolled back
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (12,13);");
+is($result, qq(0), 'Rows inserted via 2PC are visible on the subscriber');
+
+###############################
+# Check that COMMIT PREPARED is decoded properly on crash restart
+# (publisher and subscriber crash)
+###############################
+
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (12);
+ INSERT INTO tab_full VALUES (13);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (12,13);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+###############################
+# Check that COMMIT PREPARED is decoded properly on crash restart
+# (subscriber only crash)
+###############################
+
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (14);
+ INSERT INTO tab_full VALUES (15);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+
+$node_subscriber->stop('immediate');
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (14,15);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+###############################
+# Check that COMMIT PREPARED is decoded properly on crash restart
+# (publisher only crash)
+###############################
+
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (16);
+ INSERT INTO tab_full VALUES (17);
+ PREPARE TRANSACTION 'test_prepared_tab';");
+
+$node_publisher->stop('immediate');
+$node_publisher->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (16,17);");
+is($result, qq(2), 'Rows inserted via 2PC are visible on the subscriber');
+
+###############################
+# Test nested transaction with 2PC
+###############################
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (21);
+ SAVEPOINT sp_inner;
+ INSERT INTO tab_full VALUES (22);
+ ROLLBACK TO SAVEPOINT sp_inner;
+ PREPARE TRANSACTION 'outer';
+ ");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'outer';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# COMMIT
+$node_publisher->safe_psql('postgres', "
+ COMMIT PREPARED 'outer';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check the tx state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'outer';");
+is($result, qq(0), 'transaction is ended on subscriber');
+
+# check inserts are visible. 22 should be rolled back. 21 should be committed.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (21);");
+is($result, qq(1), 'Rows committed are on the subscriber');
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM tab_full where a IN (22);");
+is($result, qq(0), 'Rows rolled back are not on the subscriber');
+
+###############################
+# Test using empty GID
+###############################
+
+# check that 2PC gets replicated to subscriber
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_full VALUES (51);
+ PREPARE TRANSACTION '';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = '';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# ROLLBACK
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED '';");
+
+# check that 2PC gets aborted on subscriber
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = '';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+###############################
+# Test cases involving DDL.
+###############################
+
+# TODO This can be added after we add functionality to replicate DDL changes to subscriber.
+
+###############################
+# check all the cleanup
+###############################
+
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
diff --git a/src/test/subscription/t/021_twophase_streaming.pl b/src/test/subscription/t/021_twophase_streaming.pl
new file mode 100644
index 0000000..9a03b83
--- /dev/null
+++ b/src/test/subscription/t/021_twophase_streaming.pl
@@ -0,0 +1,521 @@
+# logical replication of 2PC test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 28;
+
+###############################
+# Test setup
+###############################
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_publisher->append_conf('postgresql.conf', qq(logical_decoding_work_mem = 64kB));
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf', qq(max_prepared_transactions = 10));
+$node_subscriber->start;
+
+# Create some pre-existing content on publisher (uses same DDL as 015_stream test)
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE test_tab (a int primary key, b varchar)");
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");
+
+# Setup structure on subscriber (columns a and b are compatible with same table name on publisher)
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+
+# Setup logical replication (streaming = on)
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres', "
+ CREATE SUBSCRIPTION tap_sub
+ CONNECTION '$publisher_connstr application_name=$appname'
+ PUBLICATION tap_pub
+ WITH (streaming = on)");
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+ "SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+ "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+ or die "Timed out while waiting for subscriber to synchronize data";
+
+###############################
+# Check initial data was copied to subscriber
+###############################
+my $result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check initial data was copied to subscriber');
+
+###############################
+# Test 2PC PREPARE / COMMIT PREPARED
+# 1. Data is streamed as a 2PC transaction.
+# 2. Then do commit prepared.
+#
+# Expect all data is replicated on subscriber side after the commit.
+###############################
+
+# check that 2PC gets replicated to subscriber
+# Insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# 2PC gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber, and extra columns contain local defaults');
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+###############################
+# Test 2PC PREPARE / ROLLBACK PREPARED.
+# 1. Table is deleted back to 2 rows which are replicated on subscriber.
+# 2. Data is streamed using 2PC
+# 3. Do rollback prepared.
+#
+# Expect data rolls back leaving only the original 2 rows.
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# 2PC tx gets aborted
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check initial data was copied to subscriber');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+###############################
+# Check that 2PC ROLLBACK PREPARED is decoded properly on crash restart.
+# 1. insert, update and delete enough rows to exceed the 64kB limit.
+# 2. Then server crashes before the 2PC transaction is rolled back.
+# 3. After servers are restarted the pending transaction is rolled back.
+#
+# Expect all inserted data is gone.
+# (Note: both publisher and subscriber crash/restart)
+###############################
+
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+
+$node_publisher->start;
+$node_subscriber->start;
+
+# rollback post the restart
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are rolled back
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'Rows inserted by 2PC are rolled back, leaving only the original 2 rows');
+
+###############################
+# Check that 2PC COMMIT PREPARED is decoded properly on crash restart.
+# 1. insert, update and delete enough rows to exceed the 64kB limit.
+# 2. Then server crashes before the 2PC transaction is committed.
+# 3. After servers are restarted the pending transaction is committed.
+#
+# Expect all data is replicated on subscriber side after the commit.
+# (Note: both publisher and subscriber crash/restart)
+###############################
+
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_subscriber->stop('immediate');
+$node_publisher->stop('immediate');
+
+$node_publisher->start;
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber, and extra columns contain local defaults');
+
+###############################
+# Check that 2PC COMMIT PREPARED is decoded properly on crash restart.
+# 1. insert, update and delete enough rows to exceed the 64kB limit.
+# 2. Then 1 server crashes before the 2PC transaction is committed.
+# 3. After servers are restarted the pending transaction is committed.
+#
+# Expect all data is replicated on subscriber side after the commit.
+# (Note: only subscriber crashes)
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# insert, update, delete enough data to cause streaming
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_subscriber->stop('immediate');
+$node_subscriber->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber');
+
+###############################
+# Check that 2PC COMMIT PREPARED is decoded properly on crash restart.
+# 1. insert, update and delete enough rows to exceed the 64kB limit.
+# 2. Then 1 server crashes before the 2PC transaction is committed.
+# 3. After servers are restarted the pending transaction is committed.
+#
+# Expect all data is replicated on subscriber side after the commit.
+# (Note: only publisher crashes)
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# insert, update, delete enough data to cause streaming
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->stop('immediate');
+$node_publisher->start;
+
+# commit post the restart
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are visible
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber');
+
+###############################
+# Do INSERT after the PREPARE but before ROLLBACK PREPARED.
+# 1. Table is deleted back to 2 rows which are replicated on subscriber.
+# 2. Data is streamed using 2PC.
+# 3. A single row INSERT is done which is after the PREPARE
+# 4. Then do a ROLLBACK PREPARED.
+#
+# Expect the 2PC data rolls back leaving only 3 rows on the subscriber.
+# (the original 2 + inserted 1)
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# Insert a different record (now we are outside of the 2PC tx)
+# Note: the 2PC tx still holds row locks so make sure this insert is for a separate primary key
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (99999, 'foobar')");
+
+# 2PC tx gets aborted
+$node_publisher->safe_psql('postgres',
+ "ROLLBACK PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is aborted on subscriber,
+# but the extra INSERT outside of the 2PC still was replicated
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3|3|3), 'check the outside insert was copied to subscriber');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is aborted on subscriber');
+
+###############################
+# Do INSERT after the PREPARE but before COMMIT PREPARED.
+# 1. Table is deleted back to 2 rows which are replicated on subscriber.
+# 2. Data is streamed using 2PC.
+# 3. A single row INSERT is done which is after the PREPARE.
+# 4. Then do a COMMIT PREPARED.
+#
+# Expect 2PC data + the extra row are on the subscriber.
+# (the 3334 + inserted 1 = 3335)
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# Insert a different record (now we are outside of the 2PC tx)
+# Note: the 2PC tx still holds row locks so make sure this insert is for a separare primary key
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_tab VALUES (99999, 'foobar')");
+
+# 2PC tx gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3335|3335|3335), 'Rows inserted by 2PC (as well as outside insert) have committed on subscriber, and extra columns contain local defaults');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+###############################
+# Do DELETE after PREPARE but before COMMIT PREPARED.
+# 1. Table is deleted back to 2 rows which are replicated on subscriber.
+# 2. Data is streamed using 2PC.
+# 3. A single row DELETE is done for one of the records that was inserted by the 2PC transaction
+# 4. Then there is a COMMIT PREPARED.
+#
+# Expect all the 2PC data rows on the subscriber (since in fact delete at step 3 would do nothing
+# because that record was not yet committed at the time of the delete).
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION 'test_prepared_tab';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# DELETE one of the prepared 2PC records before they get committed (we are outside of the 2PC tx)
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM test_tab WHERE a = 5");
+
+# 2PC tx gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED 'test_prepared_tab';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber. Nothing was deleted');
+
+# confirm the "deleted" row was in fact not deleted
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM test_tab WHERE a = 5");
+is($result, qq(1), 'The row we deleted before the commit till exists on subscriber.');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = 'test_prepared_tab';");
+is($result, qq(0), 'transaction is committed on subscriber');
+
+###############################
+# Try 2PC tx works using an empty GID literal
+###############################
+
+# First, delete the data except for 2 rows (will be replicated)
+$node_publisher->safe_psql('postgres', q{
+ DELETE FROM test_tab WHERE a > 2;});
+
+# Then insert, update and delete enough rows to exceed the 64kB limit.
+$node_publisher->safe_psql('postgres', q{
+ BEGIN;
+ INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+ UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+ DELETE FROM test_tab WHERE mod(a,3) = 0;
+ PREPARE TRANSACTION '';});
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is in prepared state on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_prepared_xacts where gid = '';");
+is($result, qq(1), 'transaction is prepared on subscriber');
+
+# 2PC tx gets committed
+$node_publisher->safe_psql('postgres',
+ "COMMIT PREPARED '';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+ or die "Timed out while waiting for subscriber to catch up";
+
+# check that transaction is committed on subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3334|3334|3334), 'Rows inserted by 2PC have committed on subscriber');
+
+###############################
+# Test cases involving DDL
+###############################
+
+# TODO This can be added after we add functionality to replicate DDL changes to subscriber.
+
+###############################
+# check all the cleanup
+###############################
+
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription");
+is($result, qq(0), 'check subscription was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_subscription_rel");
+is($result, qq(0),
+ 'check subscription relation status was dropped on subscriber');
+
+$result = $node_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_slots");
+is($result, qq(0), 'check replication slot was dropped on publisher');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT count(*) FROM pg_replication_origin");
+is($result, qq(0), 'check replication origin was dropped on subscriber');
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
--
1.8.3.1
FYI - Please find attached code coverage reports which I generated
(based on the v12 patches) after running the following tests:
1. cd contrib/test_decoding; make check
2. cd src/test/subscriber; make check
Kind Regards,
Peter Smith.
Fujitsu Australia
Show quoted text
On Tue, Oct 27, 2020 at 8:55 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Mon, Oct 26, 2020 at 6:49 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Ajin.
I checked the to see how my previous review comments (of v10) were
addressed by the latest patches (currently v12)There are a couple of remaining items.
---
====================
v12-0001. File: doc/src/sgml/logicaldecoding.sgml
====================COMMENT
Section 49.6.1
Says:
An output plugin may also define functions to support streaming of
large, in-progress transactions. The stream_start_cb, stream_stop_cb,
stream_abort_cb, stream_commit_cb, stream_change_cb, and
stream_prepare_cb are required, while stream_message_cb and
stream_truncate_cb are optional.An output plugin may also define functions to support two-phase
commits, which are decoded on PREPARE TRANSACTION. The prepare_cb,
commit_prepared_cb and rollback_prepared_cb callbacks are required,
while filter_prepare_cb is optional.
~
I was not sure how the paragraphs are organised. e.g. 1st seems to be
about streams and 2nd seems to be about two-phase commit. But they are
not mutually exclusive, so I guess I thought it was odd that
stream_prepare_cb was not mentioned in the 2nd paragraph.Or maybe it is OK as-is?
I've added stream_prepare_cb to the 2nd paragraph as well.
====================
v12-0002. File: contrib/test_decoding/expected/two_phase.out
====================COMMENT
Line 26
PREPARE TRANSACTION 'test_prepared#1';
--
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'two-phase-commit', '1', 'include-xids', '0',
'skip-empty-xacts', '1');
~
Seems like a missing comment to explain the expectation of that select.---
Updated.
COMMENT
Line 80
-- The insert should show the newly altered column.
~
Do you also need to mention something about the DDL not being present
in the decoding?Updated.
====================
v12-0002. File: src/backend/replication/logical/reorderbuffer.c
====================COMMENT
Line 1807
/* Here we are streaming and part of the PREPARE of a two-phase commit
* The full cleanup will happen as part of the COMMIT PREPAREDs, so now
* just truncate txn by removing changes and tuple_cids
*/
~
Something seems strange about the first sentence of that comment---
COMMENT
Line 1944
/* Discard the changes that we just streamed.
* This can only be called if streaming and not part of a PREPARE in
* a two-phase commit, so set prepared flag as false.
*/
~
I thought since this comment that is asserting various things, that
should also actually be written as code Assert.---
Added an assert.
COMMENT
Line 2401
/*
* We are here due to one of the 3 scenarios:
* 1. As part of streaming in-progress transactions
* 2. Prepare of a two-phase commit
* 3. Commit of a transaction.
*
* If we are streaming the in-progress transaction then discard the
* changes that we just streamed, and mark the transactions as
* streamed (if they contained changes), set prepared flag as false.
* If part of a prepare of a two-phase commit set the prepared flag
* as true so that we can discard changes and cleanup tuplecids.
* Otherwise, remove all the
* changes and deallocate the ReorderBufferTXN.
*/
~
The above comment is beyond my understanding. Anything you could do to
simplify it would be good.For example, when viewing this function in isolation I have never
understood why the streaming flag and rbtxn_prepared(txn) flag are not
possible to be set at the same time?Perhaps the code is relying on just internal knowledge of how this
helper function gets called? And if it is just that, then IMO there
really should be some Asserts in the code to give more assurance about
that. (Or maybe use completely different flags to represent those 3
scenarios instead of bending the meanings of the existing flags)Left this for now, probably re-look at this at a later review.
But just to explain; this function is what does the main decoding of
changes of a transaction.
At what point this decoding happens is what this feature and the
streaming in-progress feature is about. As of PG13, this decoding only
happens at commit time. With the streaming of in-progress txn feature,
this began to happen (if streaming enabled) at the time when the
memory limit for decoding transactions was crossed. This 2PC feature
is supporting decoding at the time of a PREPARE transaction.
Now, if streaming is enabled and streaming has started as a result of
crossing the memory threshold, then there is no need to
again begin streaming at a PREPARE transaction as the transaction that
is being prepared has already been streamed. Which is why this
function will not be called when a streaming transaction is prepared
as part of a two-phase commit.====================
v12-0003. File: src/backend/access/transam/twophase.c
====================COMMENT
Line 557
@@ -548,6 +548,33 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
}/* + * LookupGXact + * Check if the prepared transaction with the given GID is around + */ +bool +LookupGXact(const char *gid) +{ + int i; + bool found = false; ~ Alignment of the variable declarations in LookupGXact function---
Updated.
Amit, I have also updated your comment about removing function
declaration from commit 1 and I've added it to commit 2. Also removed
whitespace errors.regards,
Ajin Cherian
Fujitsu Australia
Attachments:
coverage_replication.tar.gzapplication/gzip; name=coverage_replication.tar.gzDownload
� C��_ �]]l$�U��Ivq6�GAw{��=��u��=�Y���3��h%V������TW�VU����� �G%D"�xE< �/�(H(
EH �Oy EZ��U�]u�n����;!��]wWu�{���;��S_�����5NYynvM��h��_��i��Q���F�N+5Z��4Z�����P�Qz��27p<��e��w�������>���Y��'�^�*�_���Zl�)��^��m�������o�M�"OMv���6}�����~r�:C�C��g�|�v,�%7��'N�|��yY%7�y�G��-G'wE�5���|r,���1��L� �:n���wc�O��7���M�?d����k��c���pCX�����N8c��$u�nW�g�}��!(���x���6�x}���o��
�wV�!1�����S���l��|�����������N�Pm���:���5��u����1���Ny�{l������-8f�0H{t����=��8V'C�3@)�����'��xJz���
��=)!�|�mXQ��3���z ��70������� ���a
Y�o�)5,�Z�{R
&5
��7N�e�)0dB�-��1?��
8Pr��\�����&r������
�������3jg��4���z�X�]�������������������B�aj����u������fcw��=�u��c�E�s�����hwwgS��5�sV�8��myT���)��h���#V�v�.�M��������z�T 9�Kz�k�C��-�����B�����v5�Wp rH�\��$������M�Dz]��W�dH�m���;0!\��� �?�������l���0����ky��elm�I��WX��7��E'����O��`1��kw����P�k����l��c+����P�BB��)����5:g��,�wlGX��b'��|
�%4�)v�p���!:�����r��p��^-&����
�Q*6������"0^��{���{K�<C;dn^rqQFI�5J��>���������5�+���/0�9yT4����EM%9�` �"|l>`)r����������~�s����������%����{f��L�aZ^)^�����L�|w�����8�I���e���a�*y�8rf5/;��e���A0����X���p�,\+�VL� I�R�yn.��dV�s�z ��<,Ps�����I�3Fg'H�'�5���n�x�?f���Y����v �U���Eg����)��3G����
�������6�m���^��=�:��&ty~MT#��J�2�&��Jy�#��`����L����Vd�����i�
��E�a�'���-6��T�2����SiVr�*�M��D�9�=r�\���X{���`���-���x\��wks;��q����)%��@�������+�d9S��?�F�����w��vw�zE+4��'�R�qfd�|Y���bY,r'6n2�AF7f�b?>�>�E���5��@�+.���m��&O��OMg����qn��n�L���66w���k��^�@��fpo�x���W\��O��p�, .���d���vN\��~�%}t�T�D����1�b�z%�\��� �SZ�.l6S7c�;X��]��@e9/t$2������a�����������tM�����g�U��rU�{��� �6� l��M�{<�+(�A�O���o3�8��S��l4���;���a�>��7�?n����6��>��:+������j�Ne�gMo�����h�}x��g��C�3�{�����^�����o�-�y�������_z�������/V�����S|�'��/|�_��8������ss�{����I��������������_�����#��������~>��������������������a��6��p�q��Y��?�����k�_E��?��/
�������?k��~��?������������o������?!_�~�����7^����C��_�_�����'���_���s��+�����_�>�5���6~��2<o&����jC~���h\��*Z��?s]���������������������/'�tg{��>���9E�%?���?��o~v.���~gNd{���r��#�������|6��"��������������������$��G5�h�?T��������k#�{�s�<��z���6��s�����s�?�'\��yk#��T�1��i�7��5m2�/|�����77��w���w���<�����g�������_��������~���������V����������w��2��Bu�0�����4�����_EC����\�f&��������xUP��[����h�Rm����k:��u�wm����[�o�����������-RZ.�U�������DuE���5l��naX��������wb��z���ft6� Y�3� =�,�w������c�������D����g*�y?�H�g����.7K��B�����o�erL��/Z�������}w����dT�Y�%���c�/D%j{^��\�]/������d�hk�@��5�������&����[/QM{�DN���u���eq�� �
��D�u����v7��i�[1 ��).x:����q}����q3����6���)�P\��^O)�Bz��&���z����?vF�G��U�c���It�1 X0�g�~�����u�u���U�C�&� t��!��iw�3�J�3 w������:�#����dx'P�m�%�)W� !%�<-#������r;#�[{|���2e#�(`��P�����>���g�����
�1���,r'c���_��������_�H�y����pnzoa)mT����Ju����FE[���/�����&0����2���&�tU�����n�)�^����46�i+���< v�:�W�N��RD���${����aB�B���
J������9�$�w] c�������utktl�q�H1��#Z����xu�X;q7d�3~6�6pY�I�i��Op� 2Ah��.��puV<�2����A��+�O����$����������j|M��#�������n*H7U7YIE��*�s���5���R�����W!���0mo��G��������w���N�y��������YK�SG����e�=1\�p��+d���X�)%�'�[����M��@�i(�1��M;>