Re: [HACKERS] logical decoding of two-phase transactions
Hi,
PFA, latest patchset which incorporates the additional feedback.
There's an additional test case in
0005-Additional-test-case-to-demonstrate-decoding-rollbac.patch which
uses a sleep in the "change" plugin API to allow a concurrent rollback
on the 2PC being currently decoded. Andres generally doesn't like this
approach :-), but there are no timing/interlocking issues now, and the
sleep just helps us do a concurrent rollback, so it might be ok now,
all things considered. Anyways, it's an additional patch for now.Yea, I still don't think it's ok. The tests won't be reliable. There's
ways to make this reliable, e.g. by forcing a lock to be acquired that's
externally held or such. Might even be doable just with a weird custom
datatype.Ok, I will look at ways to do away with the sleep.
The attached patchset implements a non-sleep based approached by
sending the 2PC XID to the pg_logical_slot_get_changes() function as
an option for the test_decoding plugin. So, an example invocation
will now look like:
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'skip-empty-xacts', '1', 'check-xid', '$xid2pc');
The test_decoding pg_decode_change() API if it sees a valid xid
argument will wait for it to be aborted. Another backend can then come
in and merrily abort this ongoing 2PC in the background. Once it's
aborted, the pg_decode_change API will go ahead and will hit an ERROR
in the systable scan APIs. That should take care of Andres' concern
about using sleep in the tests. The relevant tap test has been added
to this patchset.
@@ -423,6 +423,16 @@ systable_getnext(SysScanDesc sysscan)
else
htup = heap_getnext(sysscan->scan, ForwardScanDirection);+ /* + * If CheckXidAlive is valid, then we check if it aborted. If it did, we + * error out + */ + if (TransactionIdIsValid(CheckXidAlive) && + TransactionIdDidAbort(CheckXidAlive)) + ereport(ERROR, + (errcode(ERRCODE_TRANSACTION_ROLLBACK), + errmsg("transaction aborted during system catalog scan"))); + return htup; }Don't we have to check TransactionIdIsInProgress() first? C.f. header
comments in tqual.c. Note this is also not guaranteed to be correct
after a crash (where no clog entry will exist for an aborted xact), but
we probably shouldn't get here in that case - but better be safe.I suspect it'd be better reformulated as
TransactionIdIsValid(CheckXidAlive) &&
!TransactionIdIsInProgress(CheckXidAlive) &&
!TransactionIdDidCommit(CheckXidAlive)What do you think?
Modified the checks are per the above suggestion.
I was wondering if anything else would be needed for user-defined
catalog tables..
We don't need to do anything else for user-defined catalog tables
since they will also get accessed via the systable_* scan APIs.
Hmm, lemme see if we can do it outside of the plugin. But note that a
plugin might want to decode some 2PC at prepare time and another are
"commit prepared" time.
The test_decoding pg_decode_filter_prepare() API implements a simple
filter strategy now. If the GID contains a substring "nodecode", then
it filters out decoding of such a 2PC at prepare time. Have added
steps to test this in the relevant test case in this patch.
I believe this patchset handles all pending issues along with relevant
test cases. Comments, further feedback appreciated.
Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patchapplication/octet-stream; name=0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patchDownload+37-31
0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patchapplication/octet-stream; name=0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patchDownload+746-33
0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patchapplication/octet-stream; name=0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patchDownload+88-9
0004-Teach-test_decoding-plugin-to-work-with-2PC.patchapplication/octet-stream; name=0004-Teach-test_decoding-plugin-to-work-with-2PC.patchDownload+532-22
On 01/08/18 16:00, Nikhil Sontakke wrote:
I was wondering if anything else would be needed for user-defined
catalog tables..We don't need to do anything else for user-defined catalog tables
since they will also get accessed via the systable_* scan APIs.
They can be, but currently they might not be. So this requires at least
big fat warning in docs and description on how to access user catalogs
from plugins correctly (ie to always use systable_* API on them). It
would be nice if we could check for it in Assert builds at least.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018-08-01 21:55:18 +0200, Petr Jelinek wrote:
On 01/08/18 16:00, Nikhil Sontakke wrote:
I was wondering if anything else would be needed for user-defined
catalog tables..We don't need to do anything else for user-defined catalog tables
since they will also get accessed via the systable_* scan APIs.They can be, but currently they might not be. So this requires at least
big fat warning in docs and description on how to access user catalogs
from plugins correctly (ie to always use systable_* API on them). It
would be nice if we could check for it in Assert builds at least.
Yea, I agree. I think we should just consider putting similar checks in
the general scan APIs. With an unlikely() and the easy predictability of
these checks, I think we should be fine, overhead-wise.
Greetings,
Andres Freund
They can be, but currently they might not be. So this requires at least
big fat warning in docs and description on how to access user catalogs
from plugins correctly (ie to always use systable_* API on them). It
would be nice if we could check for it in Assert builds at least.
Ok, modified the sgml documentation for the above.
Yea, I agree. I think we should just consider putting similar checks in
the general scan APIs. With an unlikely() and the easy predictability of
these checks, I think we should be fine, overhead-wise.
Ok, added unlikely() checks in the heap_* scan APIs.
Revised patchset attached.
Regards,
Nikhils
Greetings,
Andres Freund
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patchapplication/octet-stream; name=0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patchDownload+143-10
0004-Teach-test_decoding-plugin-to-work-with-2PC.patchapplication/octet-stream; name=0004-Teach-test_decoding-plugin-to-work-with-2PC.patchDownload+532-22
0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patchapplication/octet-stream; name=0001-Cleaning-up-of-flags-in-ReorderBufferTXN-structure.patchDownload+37-31
0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patchapplication/octet-stream; name=0002-Support-decoding-of-two-phase-transactions-at-PREPAR.patchDownload+746-33
Hello,
I have looked through the patches. I will first describe relativaly
serious issues I see and then proceed with small nitpicking.
- On decoding of aborted xacts. The idea to throw an error once we
detect the abort is appealing, however I think you will have problems
with subxacts in the current implementation. What if subxact issues
DDL and then aborted, but main transaction successfully committed?
- Decoding transactions at PREPARE record changes rules of the "we ship
all commits after lsn 'x'" game. Namely, it will break initial
tablesync: what if consistent snapshot was formed *after* PREPARE, but
before COMMIT PREPARED, and the plugin decides to employ 2pc? Instead
of getting inital contents + continious stream of changes the receiver
will miss the prepared xact contents and raise 'prepared xact doesn't
exist' error. I think the starting point to address this is to forbid
two-phase decoding of xacts with lsn of PREPARE less than
snapbuilder's start_decoding_at.
- Currently we will call abort_prepared cb even if we failed to actually
prepare xact due to concurrent abort. I think it is confusing for
users. We should either handle this by remembering not to invoke
abort_prepared in these cases or at least document this behaviour,
leaving this problem to the receiver side.
- I find it suspicious that DecodePrepare completely ignores actions of
SnapBuildCommitTxn. For example, to execute invalidations, the latter
sets base snapshot if our xact (or subxacts) did DDL and the snapshot
not set yet. My fantasy doesn't hint me the concrete example
where this would burn at the moment, but it should be considered.
Now, the bikeshedding.
First patch:
- I am one of those people upthread who don't think that converting
flags to bitmask is beneficial -- especially given that many of them
are mutually exclusive, e.g. xact can't be committed and aborted at
the same time. Apparently you have left this to the committer though.
Second patch:
- Applying gives me
Applying: Support decoding of two-phase transactions at PREPARE
.git/rebase-apply/patch:871: trailing whitespace.
+ row. The <function>change_cb</function> callback may access system or
+ user catalog tables to aid in the process of outputting the row
+ modification details. In case of decoding a prepared (but yet
+ uncommitted) transaction or decoding of an uncommitted transaction, this
+ change callback is ensured sane access to catalog tables regardless of
+ simultaneous rollback by another backend of this very same transaction.
I don't think we should explain this, at least in such words. As
mentioned upthread, we should warn about allowed systable_* accesses
instead. Same for message_cb.
+ /*
+ * Tell the reorderbuffer about the surviving subtransactions. We need to
+ * do this because the main transaction itself has not committed since we
+ * are in the prepare phase right now. So we need to be sure the snapshot
+ * is setup correctly for the main transaction in case all changes
+ * happened in subtransanctions
+ */
While we do certainly need to associate subxacts here, the explanation
looks weird to me. I would leave just the 'Tell the reorderbuffer about
the surviving subtransactions' as in DecodeCommit.
}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmation
Spurious newline deletion.
- I would rename ReorderBufferCommitInternal to ReorderBufferReplay:
we replay the xact there, not commit.
- If xact is empty, we will not prepare it (and call cb),
even if the output plugin asked us. However, we will call
commit_prepared cb.
- ReorderBufferTxnIsPrepared and ReorderBufferPrepareNeedSkip do the
same and should be merged with comments explaining that the answer
must be stable.
- filter_prepare_cb callback existence is checked in both decode.c and
in filter_prepare_cb_wrapper.
+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */
Code around looks sane, but I think that ReorderBufferTXN for our xact
must *not* exist at this moment: if we are going to COMMIT/ABORT
PREPARED it, it must have been replayed and RBTXN purged immediately
after. Also, instead of misty '2PC transactions do not contain any
reorderbuffers' I would say something like 'create dummy
ReorderBufferTXN to pass it to the callback'.
- In DecodeAbort:
+ /*
+ * If it's ROLLBACK PREPARED then handle it via callbacks.
+ */
+ if (TransactionIdIsValid(xid) &&
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+
How xid can be invalid here?
- It might be worthwile to put the check
+ !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
+ parsed->dbId == ctx->slot->data.database &&
+ !FilterByOrigin(ctx, origin_id) &&
which appears 3 times now into separate function.
+ * two-phase transactions - we either have to have all of them or none.
+ * The filter_prepare callback is optional, but can only be defined when
Kind of controversial (all of them or none, but optional), might be
formulated more accurately.
+ /*
+ * Capabilities of the output plugin.
+ */
+ bool enable_twophase;
I would rename this to 'supports_twophase' since this is not an option
but a description of the plugin capabilities.
+ /* filter_prepare is optional, but requires two-phase decoding */
+ if ((ctx->callbacks.filter_prepare_cb != NULL) && (!ctx->enable_twophase))
+ ereport(ERROR,
+ (errmsg("Output plugin does not support two-phase decoding, but "
+ "registered
filter_prepared callback.")));
Don't think we need to check that...
+ * Otherwise call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
This is the dead code since we don't have decoding of in-progress xacts
yet.
Third patch:
+/*
+ * An xid value pointing to a possibly ongoing or a prepared transaction.
+ * Currently used in logical decoding. It's possible that such transactions
+ * can get aborted while the decoding is ongoing.
+ */
I would explain here that this xid is checked for abort after each
catalog scan, and sent for the details to SetupHistoricSnapshot.
+ /*
+ * If CheckXidAlive is valid, then we check if it aborted. If it did, we
+ * error out
+ */
+ if (TransactionIdIsValid(CheckXidAlive) &&
+ !TransactionIdIsInProgress(CheckXidAlive) &&
+ !TransactionIdDidCommit(CheckXidAlive))
+ ereport(ERROR,
+ (errcode(ERRCODE_TRANSACTION_ROLLBACK),
+ errmsg("transaction aborted during system catalog scan")));
Probably centralize checks in one function? As well as 'We don't expect
direct calls to heap_fetch...' ones.
P.S. Looks like you have torn the thread chain: In-Reply-To header of
mail [1]/messages/by-id/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com is missing. Please don't do that.
[1]: /messages/by-id/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2018-08-06 21:06:13 +0300, Arseny Sher wrote:
Hello,
I have looked through the patches. I will first describe relativaly
serious issues I see and then proceed with small nitpicking.- On decoding of aborted xacts. The idea to throw an error once we
detect the abort is appealing, however I think you will have problems
with subxacts in the current implementation. What if subxact issues
DDL and then aborted, but main transaction successfully committed?
I don't see a fundamental issue here. I've not reviewed the current
patchset meaningfully, however. Do you see a fundamental issue here?
- Decoding transactions at PREPARE record changes rules of the "we ship
all commits after lsn 'x'" game. Namely, it will break initial
tablesync: what if consistent snapshot was formed *after* PREPARE, but
before COMMIT PREPARED, and the plugin decides to employ 2pc? Instead
of getting inital contents + continious stream of changes the receiver
will miss the prepared xact contents and raise 'prepared xact doesn't
exist' error. I think the starting point to address this is to forbid
two-phase decoding of xacts with lsn of PREPARE less than
snapbuilder's start_decoding_at.
Yea, that sounds like it need to be addressed.
- Currently we will call abort_prepared cb even if we failed to actually
prepare xact due to concurrent abort. I think it is confusing for
users. We should either handle this by remembering not to invoke
abort_prepared in these cases or at least document this behaviour,
leaving this problem to the receiver side.
What precisely do you mean by "concurrent abort"?
- I find it suspicious that DecodePrepare completely ignores actions of
SnapBuildCommitTxn. For example, to execute invalidations, the latter
sets base snapshot if our xact (or subxacts) did DDL and the snapshot
not set yet. My fantasy doesn't hint me the concrete example
where this would burn at the moment, but it should be considered.
Yea, I think this need to mirror the actions (and thus generalize the
code to avoid duplication)
Now, the bikeshedding.
First patch:
- I am one of those people upthread who don't think that converting
flags to bitmask is beneficial -- especially given that many of them
are mutually exclusive, e.g. xact can't be committed and aborted at
the same time. Apparently you have left this to the committer though.
Similar.
- Andres
Andres Freund <andres@anarazel.de> writes:
- On decoding of aborted xacts. The idea to throw an error once we
detect the abort is appealing, however I think you will have problems
with subxacts in the current implementation. What if subxact issues
DDL and then aborted, but main transaction successfully committed?I don't see a fundamental issue here. I've not reviewed the current
patchset meaningfully, however. Do you see a fundamental issue here?
Hmm, yes, this is not an issue for this patch because after reading
PREPARE record we know all aborted subxacts and won't try to decode
their changes. However, this will be raised once we decide to decode
in-progress transactions. Checking for all subxids is expensive;
moreover, WAL doesn't provide all of them until commit... it might be
easier to prevent vacuuming of aborted stuff while decoding needs it.
Matter for another patch, anyway.
- Currently we will call abort_prepared cb even if we failed to actually
prepare xact due to concurrent abort. I think it is confusing for
users. We should either handle this by remembering not to invoke
abort_prepared in these cases or at least document this behaviour,
leaving this problem to the receiver side.What precisely do you mean by "concurrent abort"?
With current patch, the following is possible:
* We start decoding of some prepared xact;
* Xact aborts (ABORT PREPARED) for any reason;
* Decoding processs notices this on catalog scan and calls abort()
callback;
* Later decoding process reads abort record and calls abort_prepared
callback.
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi Arseny,
- Decoding transactions at PREPARE record changes rules of the "we ship
all commits after lsn 'x'" game. Namely, it will break initial
tablesync: what if consistent snapshot was formed *after* PREPARE, but
before COMMIT PREPARED, and the plugin decides to employ 2pc? Instead
of getting inital contents + continious stream of changes the receiver
will miss the prepared xact contents and raise 'prepared xact doesn't
exist' error. I think the starting point to address this is to forbid
two-phase decoding of xacts with lsn of PREPARE less than
snapbuilder's start_decoding_at.
It will be the job of the plugin to return a consistent answer for
every GID that is encountered. In this case, the plugin will decode
the transaction at COMMIT PREPARED time and not at PREPARE time.
- Currently we will call abort_prepared cb even if we failed to actually
prepare xact due to concurrent abort. I think it is confusing for
users. We should either handle this by remembering not to invoke
abort_prepared in these cases or at least document this behaviour,
leaving this problem to the receiver side.
The point is, when we reach the "ROLLBACK PREPARED", we have no idea
if the "PREPARE" was aborted by this rollback happening concurrently.
So it's possible that the 2PC has been successfully decoded and we
would have to send the rollback to the other side which would need to
check if it needs to rollback locally.
- I find it suspicious that DecodePrepare completely ignores actions of
SnapBuildCommitTxn. For example, to execute invalidations, the latter
sets base snapshot if our xact (or subxacts) did DDL and the snapshot
not set yet. My fantasy doesn't hint me the concrete example
where this would burn at the moment, but it should be considered.
I had discussed this area with Petr and we didn't see any issues as well, then.
Now, the bikeshedding.
First patch:
- I am one of those people upthread who don't think that converting
flags to bitmask is beneficial -- especially given that many of them
are mutually exclusive, e.g. xact can't be committed and aborted at
the same time. Apparently you have left this to the committer though.
Hmm, there seems to be divided opinion on this. I am willing to go
back to using the booleans if there's opposition and if the committer
so wishes. Note that this patch will end up adding 4/5 more booleans
in that case (we add new ones for prepare, commit prepare, abort,
rollback prepare etc).
Second patch:
- Applying gives me
Applying: Support decoding of two-phase transactions at PREPARE
.git/rebase-apply/patch:871: trailing whitespace.+ row. The <function>change_cb</function> callback may access system or + user catalog tables to aid in the process of outputting the row + modification details. In case of decoding a prepared (but yet + uncommitted) transaction or decoding of an uncommitted transaction, this + change callback is ensured sane access to catalog tables regardless of + simultaneous rollback by another backend of this very same transaction.I don't think we should explain this, at least in such words. As
mentioned upthread, we should warn about allowed systable_* accesses
instead. Same for message_cb.
Looks like you are looking at an earlier patchset. The latest patchset
has removed the above.
+ /* + * Tell the reorderbuffer about the surviving subtransactions. We need to + * do this because the main transaction itself has not committed since we + * are in the prepare phase right now. So we need to be sure the snapshot + * is setup correctly for the main transaction in case all changes + * happened in subtransanctions + */While we do certainly need to associate subxacts here, the explanation
looks weird to me. I would leave just the 'Tell the reorderbuffer about
the surviving subtransactions' as in DecodeCommit.}
-
/*
* There's a speculative insertion remaining, just clean in up, it
* can't have been successful, otherwise we'd gotten a confirmationSpurious newline deletion.
- I would rename ReorderBufferCommitInternal to ReorderBufferReplay:
we replay the xact there, not commit.- If xact is empty, we will not prepare it (and call cb),
even if the output plugin asked us. However, we will call
commit_prepared cb.- ReorderBufferTxnIsPrepared and ReorderBufferPrepareNeedSkip do the
same and should be merged with comments explaining that the answer
must be stable.- filter_prepare_cb callback existence is checked in both decode.c and
in filter_prepare_cb_wrapper.+ /* + * The transaction may or may not exist (during restarts for example). + * Anyways, 2PC transactions do not contain any reorderbuffers. So allow + * it to be created below. + */Code around looks sane, but I think that ReorderBufferTXN for our xact
must *not* exist at this moment: if we are going to COMMIT/ABORT
PREPARED it, it must have been replayed and RBTXN purged immediately
after. Also, instead of misty '2PC transactions do not contain any
reorderbuffers' I would say something like 'create dummy
ReorderBufferTXN to pass it to the callback'.- In DecodeAbort: + /* + * If it's ROLLBACK PREPARED then handle it via callbacks. + */ + if (TransactionIdIsValid(xid) && + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && +How xid can be invalid here?
- It might be worthwile to put the check + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) && + parsed->dbId == ctx->slot->data.database && + !FilterByOrigin(ctx, origin_id) &&which appears 3 times now into separate function.
+ * two-phase transactions - we either have to have all of them or none. + * The filter_prepare callback is optional, but can only be defined whenKind of controversial (all of them or none, but optional), might be
formulated more accurately.+ /* + * Capabilities of the output plugin. + */ + bool enable_twophase;I would rename this to 'supports_twophase' since this is not an option
but a description of the plugin capabilities.+ /* filter_prepare is optional, but requires two-phase decoding */ + if ((ctx->callbacks.filter_prepare_cb != NULL) && (!ctx->enable_twophase)) + ereport(ERROR, + (errmsg("Output plugin does not support two-phase decoding, but " + "registered filter_prepared callback.")));Don't think we need to check that...
+ * Otherwise call either PREPARE (for twophase transactions) or COMMIT + * (for regular ones). + */ + if (rbtxn_rollback(txn)) + rb->abort(rb, txn, commit_lsn);This is the dead code since we don't have decoding of in-progress xacts
yet.
Yes, the above check can be done away with it.
Third patch: +/* + * An xid value pointing to a possibly ongoing or a prepared transaction. + * Currently used in logical decoding. It's possible that such transactions + * can get aborted while the decoding is ongoing. + */I would explain here that this xid is checked for abort after each
catalog scan, and sent for the details to SetupHistoricSnapshot.+ /* + * If CheckXidAlive is valid, then we check if it aborted. If it did, we + * error out + */ + if (TransactionIdIsValid(CheckXidAlive) && + !TransactionIdIsInProgress(CheckXidAlive) && + !TransactionIdDidCommit(CheckXidAlive)) + ereport(ERROR, + (errcode(ERRCODE_TRANSACTION_ROLLBACK), + errmsg("transaction aborted during system catalog scan")));Probably centralize checks in one function? As well as 'We don't expect
direct calls to heap_fetch...' ones.P.S. Looks like you have torn the thread chain: In-Reply-To header of
mail [1] is missing. Please don't do that.
That wasn't me. I was also annoyed and surprised to see a new email
thread separate from the earlier containing 100 or so messages.
Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Nikhil Sontakke <nikhils@2ndquadrant.com> writes:
- Decoding transactions at PREPARE record changes rules of the "we ship
all commits after lsn 'x'" game. Namely, it will break initial
tablesync: what if consistent snapshot was formed *after* PREPARE, but
before COMMIT PREPARED, and the plugin decides to employ 2pc? Instead
of getting inital contents + continious stream of changes the receiver
will miss the prepared xact contents and raise 'prepared xact doesn't
exist' error. I think the starting point to address this is to forbid
two-phase decoding of xacts with lsn of PREPARE less than
snapbuilder's start_decoding_at.It will be the job of the plugin to return a consistent answer for
every GID that is encountered. In this case, the plugin will decode
the transaction at COMMIT PREPARED time and not at PREPARE time.
I can't imagine a scenario in which plugin would want to send COMMIT
PREPARED instead of replaying xact fully on CP record given it had never
seen PREPARE record. On the other hand, tracking such situations on
plugins side would make plugins life unneccessary complicated: either it
has to dig into snapbuilder/slot internals to learn when the snapshot
became consistent (which currently is impossible as this lsn is not
saved anywhere btw), or it must fsync each its decision to do or not to
do 2PC.
Technically, my concern covers not only tablesync, but just plain
decoding start: we don't want to ship COMMIT PREPARED if the downstream
had never had chance to see PREPARE.
As for tablesync, looking at current implementation I contemplate that
we would need to do something along the following lines:
- Tablesync worker performs COPY.
- It then speaks with main apply worker to arrange (origin)
lsn of sync point, as it does now.
- Tablesync worker applies changes up to arranged lsn; it never uses
two-phase decoding, all xacts are replayed on COMMIT PREPARED.
Moreover, instead of going into SYNCDONE state immediately after
reaching needed lsn, it stops replaying usual commits but continues
to receive changes to finish all transactions which were prepared
before sync point (we would need some additional support from
reorderbuffer to learn when this happens). Only then it goes into
SYNCDONE.
- Behaviour of the main apply worker doesn't change: it
ignores changes of the table in question before sync point and
applies them after sync point. It also can use 2PC decoding of any
transaction or not, as it desires.
I believe this approach would implement tablesync correctly (all changes
are applied, but only once) with minimal fuss.
- Currently we will call abort_prepared cb even if we failed to actually
prepare xact due to concurrent abort. I think it is confusing for
users. We should either handle this by remembering not to invoke
abort_prepared in these cases or at least document this behaviour,
leaving this problem to the receiver side.The point is, when we reach the "ROLLBACK PREPARED", we have no idea
if the "PREPARE" was aborted by this rollback happening concurrently.
So it's possible that the 2PC has been successfully decoded and we
would have to send the rollback to the other side which would need to
check if it needs to rollback locally.
I understand this. But I find this confusing for the users, so I propose
to
- Either document that "you might get abort_prepared cb called even
after abort cb was invoked for the same transaction";
- Or consider adding some infrastructure to reorderbuffer to
remember not to call abort_prepared in these cases. Due to possible
reboots, I think this means that we need not to
ReorderBufferCleanupTXN immediately after failed attempt to replay
xact on PREPARE, but mark it as 'aborted' and keep it until we see
ABORT PREPARED record. If we see that xact is marked as aborted, we
don't call abort_prepared_cb. That way even if the decoder restarts
in between, we will see PREPARE in WAL, inquire xact status (even
if we skip it as already replayed) and mark it as aborted again.
- I find it suspicious that DecodePrepare completely ignores actions of
SnapBuildCommitTxn. For example, to execute invalidations, the latter
sets base snapshot if our xact (or subxacts) did DDL and the snapshot
not set yet. My fantasy doesn't hint me the concrete example
where this would burn at the moment, but it should be considered.I had discussed this area with Petr and we didn't see any issues as well, then.
Ok, simplifying, what SnapBuildCommitTxn practically does is
* Decide whether we are interested in tracking this xact effects, and
if we are, mark it as committed.
* Build and distribute snapshot to all RBTXNs, if it is important.
* Set base snap of our xact if it did DDL, to execute invalidations
during replay.
I see that we don't need to do first two bullets during DecodePrepare:
xact effects are still invisible for everyone but itself after
PREPARE. As for seeing xacts own own changes, it is implemented via
logging cmin/cmax and we don't need to mark xact as committed for that
(c.f ReorderBufferCopySnap).
Regarding the third point... I think in 2PC decoding we might need to
execute invalidations twice:
1) After replaying xact on PREPARE to forget about catalog changes
xact did -- it is not yet committed and must be invisible to
other xacts until CP. In the latest patchset invalidations are
executed only if there is at least one change in xact (it has base
snap). It looks fine: we can't spoil catalogs if there were nothing
to decode. Better to explain that somewhere.
2) After decoding COMMIT PREPARED to make changes visible. In current
patchset it is always done. Actually, *this* is the reason
RBTXN might already exist when we enter ReorderBufferFinishPrepared,
not "(during restarts for example)" as comment says there:
if there were inval messages, RBTXN will be created
in DecodeCommit during their addition.
BTW, "that we might need to execute invalidations, add snapshot" in
SnapBuildCommitTxn looks like a cludge to me: I suppose it is better to
do that at ReorderBufferXidSetCatalogChanges.
Now, another issue is registering xact as committed in
SnapBuildCommitTxn during COMMIT PREPARED processing. Since RBTXN is
always purged after xact replay on PREPARE, the only medium we have for
noticing catalog changes during COMMIT PREPARED is invalidation messages
attached to the CP record. This raises the following question.
* If there is a guarantee that whenever xact makes catalog changes it
generates invalidation messages, then this code is fine. However,
currently ReorderBufferXidSetCatalogChanges is also called on
XLOG_HEAP_INPLACE processing and in SnapBuildProcessNewCid, which
is useless if such guarantee exists.
* If, on the other hand, there is no such guarantee, this code is
broken.
- I am one of those people upthread who don't think that converting
flags to bitmask is beneficial -- especially given that many of them
are mutually exclusive, e.g. xact can't be committed and aborted at
the same time. Apparently you have left this to the committer though.Hmm, there seems to be divided opinion on this. I am willing to go
back to using the booleans if there's opposition and if the committer
so wishes. Note that this patch will end up adding 4/5 more booleans
in that case (we add new ones for prepare, commit prepare, abort,
rollback prepare etc).
Well, you can unite mutually exclusive fields into one enum or char with
macros defining possible values. Transaction can't be committed and
aborted at the same time, etc.
+ row. The <function>change_cb</function> callback may access system or + user catalog tables to aid in the process of outputting the row + modification details. In case of decoding a prepared (but yet + uncommitted) transaction or decoding of an uncommitted transaction, this + change callback is ensured sane access to catalog tables regardless of + simultaneous rollback by another backend of this very same transaction.I don't think we should explain this, at least in such words. As
mentioned upthread, we should warn about allowed systable_* accesses
instead. Same for message_cb.Looks like you are looking at an earlier patchset. The latest patchset
has removed the above.
I see, sorry.
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hello,
Trying to revive this patch which attempts to support logical decoding of
two phase transactions. I've rebased and polished Nikhil's patch on the
current HEAD. Some of the logic in the previous patchset has already been
committed as part of large-in-progress transaction commits, like the
handling of concurrent aborts, so all that logic has been left out. I think
some of the earlier comments have already been addressed or are no longer
relevant. Do have a look at the patch and let me know what you think.I will
try and address any pending issues going forward.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
0001-Support-decoding-of-two-phase-transactions-at-PREPAR.patchapplication/octet-stream; name=0001-Support-decoding-of-two-phase-transactions-at-PREPAR.patchDownload+1159-56
On Mon, Sep 7, 2020 at 10:54 AM Ajin Cherian <itsajin@gmail.com> wrote:
Hello,
Trying to revive this patch which attempts to support logical decoding of two phase transactions. I've rebased and polished Nikhil's patch on the current HEAD. Some of the logic in the previous patchset has already been committed as part of large-in-progress transaction commits, like the handling of concurrent aborts, so all that logic has been left out.
I am not sure about your point related to concurrent aborts. I think
we need some changes related to this patch. Have you tried to test
this behavior? Basically, we have the below code in
ReorderBufferProcessTXN() which will be hit for concurrent aborts, and
currently, the Asserts shown below will fail.
if (errdata->sqlerrcode == ERRCODE_TRANSACTION_ROLLBACK)
{
/*
* This error can only occur when we are sending the data in
* streaming mode and the streaming is not finished yet.
*/
Assert(streaming);
Assert(stream_started);
Nikhil has a test for the same
(0004-Teach-test_decoding-plugin-to-work-with-2PC.Jan4) in his last
email [1]. You might want to use it to test this behavior. I think you
can also keep the tests as a separate patch as Nikhil had.
One other comment:
===================
@@ -27,6 +27,7 @@ typedef struct OutputPluginOptions
{
OutputPluginOutputType output_type;
bool receive_rewrites;
+ bool enable_twophase;
} OutputPluginOptions;
..
..
@@ -684,6 +699,33 @@ startup_cb_wrapper(LogicalDecodingContext *ctx,
OutputPluginOptions *opt, bool i
/* do the actual work: call callback */
ctx->callbacks.startup_cb(ctx, opt, is_init);
+ /*
+ * If the plugin claims to support two-phase transactions, then
+ * check that the plugin implements all callbacks necessary to decode
+ * two-phase transactions - we either have to have all of them or none.
+ * The filter_prepare callback is optional, but can only be defined when
+ * two-phase decoding is enabled (i.e. the three other callbacks are
+ * defined).
+ */
+ if (opt->enable_twophase)
+ {
+ int twophase_callbacks = (ctx->callbacks.prepare_cb != NULL) +
+ (ctx->callbacks.commit_prepared_cb != NULL) +
+ (ctx->callbacks.abort_prepared_cb != NULL);
+
+ /* Plugins with incorrect number of two-phase callbacks are broken. */
+ if ((twophase_callbacks != 3) && (twophase_callbacks != 0))
+ ereport(ERROR,
+ (errmsg("Output plugin registered only %d twophase callbacks. ",
+ twophase_callbacks)));
+ }
I don't know why the patch has used this way to implement an option to
enable two-phase. Can't we use how we implement 'stream-changes'
option in commit 7259736a6e? Just refer how we set ctx->streaming and
you can use a similar way to set this parameter.
--
With Regards,
Amit Kapila.
On Mon, Sep 7, 2020 at 11:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Nikhil has a test for the same
(0004-Teach-test_decoding-plugin-to-work-with-2PC.Jan4) in his last
email [1]. You might want to use it to test this behavior. I think you
can also keep the tests as a separate patch as Nikhil had.Done. I've added the tests and also tweaked code to make sure that the
aborts during 2 phase commits are also handled.
I don't know why the patch has used this way to implement an option to
enable two-phase. Can't we use how we implement 'stream-changes'
option in commit 7259736a6e? Just refer how we set ctx->streaming and
you can use a similar way to set this parameter.
Done, I've moved the checks for callbacks to inside the corresponding
wrappers.
Regards,
Ajin Cherian
Fujitsu Australia
Attachments:
0001-Support-decoding-of-two-phase-transactions-at-PREPAR.patchapplication/octet-stream; name=0001-Support-decoding-of-two-phase-transactions-at-PREPAR.patchDownload+1165-62
0002-Tap-test-to-test-concurrent-aborts-during-2-phase-co.patchapplication/octet-stream; name=0002-Tap-test-to-test-concurrent-aborts-during-2-phase-co.patchDownload+120-1
On Wed, Sep 9, 2020 at 3:33 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Mon, Sep 7, 2020 at 11:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Nikhil has a test for the same
(0004-Teach-test_decoding-plugin-to-work-with-2PC.Jan4) in his last
email [1]. You might want to use it to test this behavior. I think you
can also keep the tests as a separate patch as Nikhil had.Done. I've added the tests and also tweaked code to make sure that the aborts during 2 phase commits are also handled.
Okay, I'll look into your changes but before that today, I have gone
through this entire thread to check if there are any design problems
and found that there were two major issues in the original proposal,
(a) one was to handle concurrent aborts which I think we should be
able to deal in a way similar to what we have done for decoding of
in-progress transactions and (b) what if someone specifically locks
pg_class or pg_attribute in exclusive mode (say be Lock pg_attribute
...), it seems the deadlock can happen in that case [0]/messages/by-id/20170328012546.473psm6546bgsi2c@alap3.anarazel.de. AFAIU, people
seem to think if there is no realistic scenario where deadlock can
happen apart from user explicitly locking the system catalog then we
might be able to get away by just ignoring such xacts to be decoded at
prepare time or would block it in some other way as any way that will
block the entire system. I am not sure what is the right thing but
something has to be done to avoid any sort of deadlock for this.
Another thing, I noticed is that originally we have subscriber-side
support as well, see [1]/messages/by-id/CAMGcDxchx=0PeQBVLzrgYG2AQ49QSRxHj5DCp7yy0QrJR0S0nA@mail.gmail.com (see *pgoutput* patch) but later dropped it
due to some reasons [2]/messages/by-id/CAMGcDxc-kuO9uq0zRCRwbHWBj_rePY9=raR7M9pZGWoj9EOGdg@mail.gmail.com. I think we should have pgoutput support as
well, so see what is required to get that incorporated.
I would also like to summarize my thinking on the usefulness of this
feature. One of the authors of this patch Stats wants this for a
conflict-free logical replication, see more details [3]/messages/by-id/CAMsr+YHQzGxnR-peT4SbX2-xiG2uApJMTgZ4a3TiRBM6COyfqg@mail.gmail.com. Craig seems
to suggest [3]/messages/by-id/CAMsr+YHQzGxnR-peT4SbX2-xiG2uApJMTgZ4a3TiRBM6COyfqg@mail.gmail.com that this will allow us to avoid conflicting schema
changes at different nodes though it is not clear to me if that is
possible without some external code support because we don't send
schema changes in logical replication, maybe Craig can shed some light
on this. Another use-case, I am thinking is if this can be used for
scaling-out reads as well. Because of 2PC, we can ensure that on
subscribers we have all the data committed on the master. Now, we can
design a system where different nodes are owners of some set of tables
and we can always get the data of those tables reliably from those
nodes, and then one can have some external process that will route the
reads accordingly. I know that the last idea is a bit of a hand-waving
but it seems to be possible after this feature.
[0]: /messages/by-id/20170328012546.473psm6546bgsi2c@alap3.anarazel.de
[1]: /messages/by-id/CAMGcDxchx=0PeQBVLzrgYG2AQ49QSRxHj5DCp7yy0QrJR0S0nA@mail.gmail.com
[2]: /messages/by-id/CAMGcDxc-kuO9uq0zRCRwbHWBj_rePY9=raR7M9pZGWoj9EOGdg@mail.gmail.com
[3]: /messages/by-id/CAMsr+YHQzGxnR-peT4SbX2-xiG2uApJMTgZ4a3TiRBM6COyfqg@mail.gmail.com
--
With Regards,
Amit Kapila.
On Sat, Sep 12, 2020 at 9:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Another thing, I noticed is that originally we have subscriber-side
support as well, see [1] (see *pgoutput* patch) but later dropped it
due to some reasons [2]. I think we should have pgoutput support as
well, so see what is required to get that incorporated.I have added the rebased patch-set for pgoutput and subscriber side
changes as well. This also includes a test case in subscriber.
regards,
Ajin Cherian
Attachments:
0001-Support-decoding-of-two-phase-transactions.patchapplication/octet-stream; name=0001-Support-decoding-of-two-phase-transactions.patchDownload+1165-62
0002-Tap-test-to-test-concurrent-aborts-during-2-phase-co.patchapplication/octet-stream; name=0002-Tap-test-to-test-concurrent-aborts-during-2-phase-co.patchDownload+121-1
0003-pgoutput-output-plugin-support-for-logical-decoding-.patchapplication/octet-stream; name=0003-pgoutput-output-plugin-support-for-logical-decoding-.patchDownload+532-10
On Wed, Sep 9, 2020 at 3:33 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Mon, Sep 7, 2020 at 11:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Nikhil has a test for the same
(0004-Teach-test_decoding-plugin-to-work-with-2PC.Jan4) in his last
email [1]. You might want to use it to test this behavior. I think you
can also keep the tests as a separate patch as Nikhil had.Done. I've added the tests and also tweaked code to make sure that the aborts during 2 phase commits are also handled.
I don't think it is complete yet.
*
* This error can only occur when we are sending the data in
* streaming mode and the streaming is not finished yet.
*/
- Assert(streaming);
- Assert(stream_started);
+ Assert(streaming || rbtxn_prepared(txn));
+ Assert(stream_started || rbtxn_prepared(txn));
Here, you have updated the code but comments are still not updated.
*
@@ -2370,10 +2391,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
- /* Reset the TXN so that it is allowed to stream remaining data. */
- ReorderBufferResetTXN(rb, txn, snapshot_now,
- command_id, prev_lsn,
- specinsert);
+ /* If streaming, reset the TXN so that it is allowed to stream
remaining data. */
+ if (streaming && stream_started)
+ {
+ ReorderBufferResetTXN(rb, txn, snapshot_now,
+ command_id, prev_lsn,
+ specinsert);
+ }
+ else
+ {
+ elog(LOG, "stopping decoding of %s (%u)",
+ txn->gid[0] != '\0'? txn->gid:"", txn->xid);
+ rb->abort(rb, txn, commit_lsn);
+ }
I don't think we need to perform abort here. Later we will anyway
encounter the WAL for Rollback Prepared for which we will call
abort_prepared_cb. As we have set the 'concurrent_abort' flag, it will
allow us to skip all the intermediate records. Here, we need only
enough state in ReorderBufferTxn that it can be later used for
ReorderBufferFinishPrepared(). Basically, you need functionality
similar to ReorderBufferTruncateTXN where except for invalidations you
can free memory for everything else. You can either write a new
function ReorderBufferTruncatePreparedTxn or pass another bool
parameter in ReorderBufferTruncateTXN to indicate it is prepared_xact
and then clean up additional things that are not required for prepared
xact.
*
Similarly, I don't understand why we need below code:
ReorderBufferProcessTXN()
{
..
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
..
}
There is nowhere we are setting the RBTXN_ROLLBACK flag, so how will
this check be true? If we decide to remove this code then don't forget
to update the comments.
*
If my previous two comments are correct then I don't think we need the
below interface.
+ <sect3 id="logicaldecoding-output-plugin-abort">
+ <title>Transaction Abort Callback</title>
+
+ <para>
+ The required <function>abort_cb</function> callback is called whenever
+ a transaction abort has to be initiated. This can happen if we are
+ decoding a transaction that has been prepared for two-phase commit and
+ a concurrent rollback happens while we are decoding it.
+<programlisting>
+typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ XLogRecPtr abort_lsn);
I don't know why the patch has used this way to implement an option to
enable two-phase. Can't we use how we implement 'stream-changes'
option in commit 7259736a6e? Just refer how we set ctx->streaming and
you can use a similar way to set this parameter.Done, I've moved the checks for callbacks to inside the corresponding wrappers.
This is not what I suggested. Please study the commit 7259736a6e and
see how streaming option is implemented. I want later subscribers can
specify whether they want transactions to be decoded at prepare time
similar to what we have done for streaming. Also, search for
ctx->streaming in the code and see how it is set to get the idea.
Note: Please use version number while sending patches, you can use
something like git format-patch -N -v n to do that. It makes easier
for the reviewer to compare it with the previous version.
Few other comments:
===================
1.
ReorderBufferProcessTXN()
{
..
if (streaming)
{
ReorderBufferTruncateTXN(rb, txn);
/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
else
ReorderBufferCleanupTXN(rb, txn);
..
}
I don't think we can perform ReorderBufferCleanupTXN for the prepared
transactions because if we have removed the ReorderBufferTxn before
commit, the later code might not consider such a transaction in the
system and compute the wrong value of restart_lsn for a slot.
Basically, in SnapBuildProcessRunningXacts() when we call
ReorderBufferGetOldestTXN(), it should show the ReorderBufferTxn of
the prepared transaction which is not yet committed but because we
have removed it after prepare, it won't get that TXN and then that
leads to wrong computation of restart_lsn. Once we start from a wrong
point in WAL, the snapshot built was incorrect which will lead to the
wrong result. This is the same reason why the patch is not doing
ReorderBufferForget in DecodePrepare when we decide to skip the
transaction. Also, here, we need to set CheckXidAlive =
InvalidTransactionId; for prepared xact as well.
2. Have you thought about the interaction of streaming with prepared
transactions? You can try writing some tests using pg_logical* APIs
and see the behaviour. For ex. there is no handling in
ReorderBufferStreamCommit for the same. I think you need to introduce
stream_prepare API similar to stream_commit and then use the same.
3.
- if (streaming)
+ if (streaming || rbtxn_prepared(change->txn))
{
curtxn = change->txn;
SetupCheckXidLive(curtxn->xid);
@@ -2249,7 +2254,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
break;
}
}
-
/*
Spurious line removal.
--
With Regards,
Amit Kapila.
On Tue, Sep 15, 2020 at 5:27 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Sat, Sep 12, 2020 at 9:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Another thing, I noticed is that originally we have subscriber-side
support as well, see [1] (see *pgoutput* patch) but later dropped it
due to some reasons [2]. I think we should have pgoutput support as
well, so see what is required to get that incorporated.I have added the rebased patch-set for pgoutput and subscriber side changes as well. This also includes a test case in subscriber.
As mentioned in my email there were some reasons due to which the
support has been left for later, have you checked those and if so, can
you please explain how you have addressed those or why they are not
relevant now if that is the case?
--
With Regards,
Amit Kapila.
On Tue, Sep 15, 2020 at 10:43 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
Few other comments:
===================
1.
ReorderBufferProcessTXN()
{
..
if (streaming)
{
ReorderBufferTruncateTXN(rb, txn);/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
else
ReorderBufferCleanupTXN(rb, txn);
..
}I don't think we can perform ReorderBufferCleanupTXN for the prepared
transactions because if we have removed the ReorderBufferTxn before
commit, the later code might not consider such a transaction in the
system and compute the wrong value of restart_lsn for a slot.
Basically, in SnapBuildProcessRunningXacts() when we call
ReorderBufferGetOldestTXN(), it should show the ReorderBufferTxn of
the prepared transaction which is not yet committed but because we
have removed it after prepare, it won't get that TXN and then that
leads to wrong computation of restart_lsn. Once we start from a wrong
point in WAL, the snapshot built was incorrect which will lead to the
wrong result. This is the same reason why the patch is not doing
ReorderBufferForget in DecodePrepare when we decide to skip the
transaction. Also, here, we need to set CheckXidAlive =
InvalidTransactionId; for prepared xact as well.
Just to confirm what you are expecting here. so after we send out the
prepare transaction to the plugin, you are suggesting to NOT do a
ReorderBufferCleanupTXN, but what to do instead?. Are you suggesting to do
what you suggested
as part of concurrent abort handling? Something equivalent
to ReorderBufferTruncateTXN()? remove all changes of the transaction but
keep the invalidations and tuplecids etc? Do you think we should have a new
flag in txn to indicate that this transaction has already been decoded?
(prepare_decoded?) Any other special handling you think is required?
regards,
Ajin Cherian
Fujitsu Australia
On Thu, Sep 17, 2020 at 2:02 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Tue, Sep 15, 2020 at 10:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other comments:
===================
1.
ReorderBufferProcessTXN()
{
..
if (streaming)
{
ReorderBufferTruncateTXN(rb, txn);/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
else
ReorderBufferCleanupTXN(rb, txn);
..
}I don't think we can perform ReorderBufferCleanupTXN for the prepared
transactions because if we have removed the ReorderBufferTxn before
commit, the later code might not consider such a transaction in the
system and compute the wrong value of restart_lsn for a slot.
Basically, in SnapBuildProcessRunningXacts() when we call
ReorderBufferGetOldestTXN(), it should show the ReorderBufferTxn of
the prepared transaction which is not yet committed but because we
have removed it after prepare, it won't get that TXN and then that
leads to wrong computation of restart_lsn. Once we start from a wrong
point in WAL, the snapshot built was incorrect which will lead to the
wrong result. This is the same reason why the patch is not doing
ReorderBufferForget in DecodePrepare when we decide to skip the
transaction. Also, here, we need to set CheckXidAlive =
InvalidTransactionId; for prepared xact as well.Just to confirm what you are expecting here. so after we send out the prepare transaction to the plugin, you are suggesting to NOT do a ReorderBufferCleanupTXN, but what to do instead?. Are you suggesting to do what you suggested
as part of concurrent abort handling?
Yes.
Something equivalent to ReorderBufferTruncateTXN()? remove all changes of the transaction but keep the invalidations and tuplecids etc?
I don't think you don't need tuplecids. I have checked
ReorderBufferFinishPrepared() and that seems to require only
invalidations, check if anything else is required.
Do you think we should have a new flag in txn to indicate that this transaction has already been decoded? (prepare_decoded?)
Yeah, I think that would be better. How about if name the new variable
as cleanup_prepared?
--
With Regards,
Amit Kapila.
On Tue, Sep 15, 2020 at 10:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I don't think it is complete yet. * * This error can only occur when we are sending the data in * streaming mode and the streaming is not finished yet. */ - Assert(streaming); - Assert(stream_started); + Assert(streaming || rbtxn_prepared(txn)); + Assert(stream_started || rbtxn_prepared(txn));Here, you have updated the code but comments are still not updated.
Updated the comments.
I don't think we need to perform abort here. Later we will anyway
encounter the WAL for Rollback Prepared for which we will call
abort_prepared_cb. As we have set the 'concurrent_abort' flag, it will
allow us to skip all the intermediate records. Here, we need only
enough state in ReorderBufferTxn that it can be later used for
ReorderBufferFinishPrepared(). Basically, you need functionality
similar to ReorderBufferTruncateTXN where except for invalidations you
can free memory for everything else. You can either write a new
function ReorderBufferTruncatePreparedTxn or pass another bool
parameter in ReorderBufferTruncateTXN to indicate it is prepared_xact
and then clean up additional things that are not required for prepared
xact.
Added a new parameter to ReorderBufferTruncatePreparedTxn for
prepared transactions and did cleanup of tupulecids as well, I have
left snapshots and transactions.
As a result of this, I also had to create a new function
ReorderBufferCleanupPreparedTXN which will clean up the rest as part
of FinishPrepared handling as we can't call
ReorderBufferCleanupTXN again after this.
*
Similarly, I don't understand why we need below code:
ReorderBufferProcessTXN()
{
..
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);
..
}There is nowhere we are setting the RBTXN_ROLLBACK flag, so how will
this check be true? If we decide to remove this code then don't forget
to update the comments.
Removed.
* If my previous two comments are correct then I don't think we need the below interface. + <sect3 id="logicaldecoding-output-plugin-abort"> + <title>Transaction Abort Callback</title> + + <para> + The required <function>abort_cb</function> callback is called whenever + a transaction abort has to be initiated. This can happen if we are + decoding a transaction that has been prepared for two-phase commit and + a concurrent rollback happens while we are decoding it. +<programlisting> +typedef void (*LogicalDecodeAbortCB) (struct LogicalDecodingContext *ctx, + ReorderBufferTXN *txn, + XLogRecPtr abort_lsn);
Removed.
I don't know why the patch has used this way to implement an option to
enable two-phase. Can't we use how we implement 'stream-changes'
option in commit 7259736a6e? Just refer how we set ctx->streaming and
you can use a similar way to set this parameter.Done, I've moved the checks for callbacks to inside the corresponding wrappers.
This is not what I suggested. Please study the commit 7259736a6e and
see how streaming option is implemented. I want later subscribers can
specify whether they want transactions to be decoded at prepare time
similar to what we have done for streaming. Also, search for
ctx->streaming in the code and see how it is set to get the idea.
Changed it similar to ctx->streaming logic.
Note: Please use version number while sending patches, you can use
something like git format-patch -N -v n to do that. It makes easier
for the reviewer to compare it with the previous version.
Done.
Few other comments:
===================
1.
ReorderBufferProcessTXN()
{
..
if (streaming)
{
ReorderBufferTruncateTXN(rb, txn);/* Reset the CheckXidAlive */
CheckXidAlive = InvalidTransactionId;
}
else
ReorderBufferCleanupTXN(rb, txn);
..
}I don't think we can perform ReorderBufferCleanupTXN for the prepared
transactions because if we have removed the ReorderBufferTxn before
commit, the later code might not consider such a transaction in the
system and compute the wrong value of restart_lsn for a slot.
Basically, in SnapBuildProcessRunningXacts() when we call
ReorderBufferGetOldestTXN(), it should show the ReorderBufferTxn of
the prepared transaction which is not yet committed but because we
have removed it after prepare, it won't get that TXN and then that
leads to wrong computation of restart_lsn. Once we start from a wrong
point in WAL, the snapshot built was incorrect which will lead to the
wrong result. This is the same reason why the patch is not doing
ReorderBufferForget in DecodePrepare when we decide to skip the
transaction. Also, here, we need to set CheckXidAlive =
InvalidTransactionId; for prepared xact as well.
Updated as suggested above.
2. Have you thought about the interaction of streaming with prepared
transactions? You can try writing some tests using pg_logical* APIs
and see the behaviour. For ex. there is no handling in
ReorderBufferStreamCommit for the same. I think you need to introduce
stream_prepare API similar to stream_commit and then use the same.
This is pending. I will look at it in the next iteration. Also pending
is the investigation as to why the pgoutput changes were not added
initially.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v4-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchapplication/octet-stream; name=v4-0002-Tap-test-to-test-concurrent-aborts-during-2-phase.patchDownload+121-1
v4-0001-Support-decoding-of-two-phase-transactions.patchapplication/octet-stream; name=v4-0001-Support-decoding-of-two-phase-transactions.patchDownload+1183-74
v4-0003-pgoutput-output-plugin-support-for-logical-decodi.patchapplication/octet-stream; name=v4-0003-pgoutput-output-plugin-support-for-logical-decodi.patchDownload+514-10
On Thu, Sep 17, 2020 at 10:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Yeah, I think that would be better. How about if name the new variable
as cleanup_prepared?
I haven't added a new flag to indicate that the prepare was cleaned
up, as that wasn' really necessary. Instead I used a new function to
do partial cleanup to do whatever was not done in the truncate. If you
think, using a flag and doing special handling in
ReorderBufferCleanupTXN was a better idea, let me know.
regards,
Ajin Cherian
Fujitsu Australia