Skip collecting decoded changes of already-aborted transactions

sawada.mshk@gmail.com

almost 3 years ago

In reply to: Andres Freund (#2)

Re: Skip collecting decoded changes of already-aborted transactions

On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-06-09 14:16:44 +0900, Masahiko Sawada wrote:

In logical decoding, we don't need to collect decoded changes of
aborted transactions. While streaming changes, we can detect
concurrent abort of the (sub)transaction but there is no mechanism to
skip decoding changes of transactions that are known to already be
aborted. With the attached WIP patch, we check CLOG when decoding the
transaction for the first time. If it's already known to be aborted,
we skip collecting decoded changes of such transactions. That way,
when the logical replication is behind or restarts, we don't need to
decode large transactions that already aborted, which helps improve
the decoding performance.

Thank you for the comment.

It's very easy to get uses of TransactionIdDidAbort() wrong. For one, it won't
return true when a transaction was implicitly aborted due to a crash /
restart. You're also supposed to use it only after a preceding
TransactionIdIsInProgress() call.

I'm not sure there are issues with not checking TransactionIdIsInProgress()
first in this case, but I'm also not sure there aren't.

Yeah, it seems to be better to use !TransactionIdDidCommit() with a
preceding TransactionIdIsInProgress() check.

A separate issue is that TransactionIdDidAbort() can end up being very slow if
a lot of transactions are in progress concurrently. As soon as the clog
buffers are extended all time is spent copying pages from the kernel
pagecache. I'd not at all be surprised if this changed causes a substantial
slowdown in workloads with lots of small transactions, where most transactions
commit.

Indeed. So it should check the transaction status less frequently. It
doesn't benefit much even if we can skip collecting decoded changes of
small transactions. Another idea is that we check the status of only
large transactions. That is, when the size of decoded changes of an
aborted transaction exceeds logical_decoding_work_mem, we mark it as
aborted , free its changes decoded so far, and skip further
collection.

Regards

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

amit.kapila16@gmail.com

almost 3 years ago

In reply to: Masahiko Sawada (#3)

Re: Skip collecting decoded changes of already-aborted transactions

On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:

A separate issue is that TransactionIdDidAbort() can end up being very slow if
a lot of transactions are in progress concurrently. As soon as the clog
buffers are extended all time is spent copying pages from the kernel
pagecache. I'd not at all be surprised if this changed causes a substantial
slowdown in workloads with lots of small transactions, where most transactions
commit.

Indeed. So it should check the transaction status less frequently. It
doesn't benefit much even if we can skip collecting decoded changes of
small transactions. Another idea is that we check the status of only
large transactions. That is, when the size of decoded changes of an
aborted transaction exceeds logical_decoding_work_mem, we mark it as
aborted , free its changes decoded so far, and skip further
collection.

Your idea might work for large transactions but I have not come across
reports where this is reported as a problem. Do you see any such
reports and can we see how much is the benefit with large
transactions? Because we do have the handling of concurrent aborts
during sys table scans and that might help sometimes for large
transactions.

--
With Regards,
Amit Kapila.

sawada.mshk@gmail.com

almost 3 years ago

In reply to: Amit Kapila (#4)

Re: Skip collecting decoded changes of already-aborted transactions

On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:

A separate issue is that TransactionIdDidAbort() can end up being very slow if
a lot of transactions are in progress concurrently. As soon as the clog
buffers are extended all time is spent copying pages from the kernel
pagecache. I'd not at all be surprised if this changed causes a substantial
slowdown in workloads with lots of small transactions, where most transactions
commit.

Indeed. So it should check the transaction status less frequently. It
doesn't benefit much even if we can skip collecting decoded changes of
small transactions. Another idea is that we check the status of only
large transactions. That is, when the size of decoded changes of an
aborted transaction exceeds logical_decoding_work_mem, we mark it as
aborted , free its changes decoded so far, and skip further
collection.

Your idea might work for large transactions but I have not come across
reports where this is reported as a problem. Do you see any such
reports and can we see how much is the benefit with large
transactions? Because we do have the handling of concurrent aborts
during sys table scans and that might help sometimes for large
transactions.

I've heard there was a case where a user had 29 million deletes in a
single transaction with each one wrapped in a savepoint and rolled it
back, which led to 11TB of spill files. If decoding such a large
transaction fails for some reasons (e.g. a disk full), it would try
decoding the same transaction again and again.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

amit.kapila16@gmail.com

almost 3 years ago

In reply to: Masahiko Sawada (#5)

Re: Skip collecting decoded changes of already-aborted transactions

On Wed, Jun 21, 2023 at 8:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:

A separate issue is that TransactionIdDidAbort() can end up being very slow if
a lot of transactions are in progress concurrently. As soon as the clog
buffers are extended all time is spent copying pages from the kernel
pagecache. I'd not at all be surprised if this changed causes a substantial
slowdown in workloads with lots of small transactions, where most transactions
commit.

Indeed. So it should check the transaction status less frequently. It
doesn't benefit much even if we can skip collecting decoded changes of
small transactions. Another idea is that we check the status of only
large transactions. That is, when the size of decoded changes of an
aborted transaction exceeds logical_decoding_work_mem, we mark it as
aborted , free its changes decoded so far, and skip further
collection.

Your idea might work for large transactions but I have not come across
reports where this is reported as a problem. Do you see any such
reports and can we see how much is the benefit with large
transactions? Because we do have the handling of concurrent aborts
during sys table scans and that might help sometimes for large
transactions.

I've heard there was a case where a user had 29 million deletes in a
single transaction with each one wrapped in a savepoint and rolled it
back, which led to 11TB of spill files. If decoding such a large
transaction fails for some reasons (e.g. a disk full), it would try
decoding the same transaction again and again.

I was thinking why the existing handling of concurrent aborts doesn't
handle such a case and it seems that we check that only on catalog
access. However, in your case, the user probably is accessing the same
relation without any concurrent DDL on the same table, so it would
just be a cache look-up for catalogs. Your idea of checking aborts
every logical_decoding_work_mem should work for such cases.

--
With Regards,
Amit Kapila.

dilipbalaut@gmail.com

almost 3 years ago

In reply to: Masahiko Sawada (#1)

Re: Skip collecting decoded changes of already-aborted transactions

On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

In logical decoding, we don't need to collect decoded changes of
aborted transactions. While streaming changes, we can detect
concurrent abort of the (sub)transaction but there is no mechanism to
skip decoding changes of transactions that are known to already be
aborted. With the attached WIP patch, we check CLOG when decoding the
transaction for the first time. If it's already known to be aborted,
we skip collecting decoded changes of such transactions. That way,
when the logical replication is behind or restarts, we don't need to
decode large transactions that already aborted, which helps improve
the decoding performance.

+1 for the idea of checking the transaction status only when we need
to flush it to the disk or send it downstream (if streaming in
progress is enabled). Although this check is costly since we are
planning only for large transactions then it is worth it if we can
occasionally avoid disk or network I/O for the aborted transactions.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

sawada.mshk@gmail.com

almost 3 years ago

In reply to: Dilip Kumar (#7)

Re: Skip collecting decoded changes of already-aborted transactions

On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

In logical decoding, we don't need to collect decoded changes of
aborted transactions. While streaming changes, we can detect
concurrent abort of the (sub)transaction but there is no mechanism to
skip decoding changes of transactions that are known to already be
aborted. With the attached WIP patch, we check CLOG when decoding the
transaction for the first time. If it's already known to be aborted,
we skip collecting decoded changes of such transactions. That way,
when the logical replication is behind or restarts, we don't need to
decode large transactions that already aborted, which helps improve
the decoding performance.

+1 for the idea of checking the transaction status only when we need
to flush it to the disk or send it downstream (if streaming in
progress is enabled). Although this check is costly since we are
planning only for large transactions then it is worth it if we can
occasionally avoid disk or network I/O for the aborted transactions.

Thanks.

I've attached the updated patch. With this patch, we check the
transaction status for only large-transactions when eviction. For
regression test purposes, I disable this transaction status check when
logical_replication_mode is set to 'immediate'.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

vignesh21@gmail.com

over 2 years ago

In reply to: Masahiko Sawada (#8)

Re: Skip collecting decoded changes of already-aborted transactions

On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

In logical decoding, we don't need to collect decoded changes of
aborted transactions. While streaming changes, we can detect
concurrent abort of the (sub)transaction but there is no mechanism to
skip decoding changes of transactions that are known to already be
aborted. With the attached WIP patch, we check CLOG when decoding the
transaction for the first time. If it's already known to be aborted,
we skip collecting decoded changes of such transactions. That way,
when the logical replication is behind or restarts, we don't need to
decode large transactions that already aborted, which helps improve
the decoding performance.

+1 for the idea of checking the transaction status only when we need
to flush it to the disk or send it downstream (if streaming in
progress is enabled). Although this check is costly since we are
planning only for large transactions then it is worth it if we can
occasionally avoid disk or network I/O for the aborted transactions.

Thanks.

I've attached the updated patch. With this patch, we check the
transaction status for only large-transactions when eviction. For
regression test purposes, I disable this transaction status check when
logical_replication_mode is set to 'immediate'.

May be there is some changes that are missing in the patch, which is
giving the following errors:
reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
(first use in this function)
3584 | if (unlikely(logical_replication_mode ==
LOGICAL_REP_MODE_IMMEDIATE))
| ^~~~~~~~~~~~~~~~~~~~~~~~

Regards,
Vignesh

#10

vignesh21@gmail.com

about 2 years ago

In reply to: vignesh C (#9)

Re: Skip collecting decoded changes of already-aborted transactions

On Tue, 3 Oct 2023 at 15:54, vignesh C <vignesh21@gmail.com> wrote:

On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

In logical decoding, we don't need to collect decoded changes of
aborted transactions. While streaming changes, we can detect
concurrent abort of the (sub)transaction but there is no mechanism to
skip decoding changes of transactions that are known to already be
aborted. With the attached WIP patch, we check CLOG when decoding the
transaction for the first time. If it's already known to be aborted,
we skip collecting decoded changes of such transactions. That way,
when the logical replication is behind or restarts, we don't need to
decode large transactions that already aborted, which helps improve
the decoding performance.

+1 for the idea of checking the transaction status only when we need
to flush it to the disk or send it downstream (if streaming in
progress is enabled). Although this check is costly since we are
planning only for large transactions then it is worth it if we can
occasionally avoid disk or network I/O for the aborted transactions.

Thanks.

I've attached the updated patch. With this patch, we check the
transaction status for only large-transactions when eviction. For
regression test purposes, I disable this transaction status check when
logical_replication_mode is set to 'immediate'.

May be there is some changes that are missing in the patch, which is
giving the following errors:
reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
(first use in this function)
3584 | if (unlikely(logical_replication_mode ==
LOGICAL_REP_MODE_IMMEDIATE))
| ^~~~~~~~~~~~~~~~~~~~~~~~

With no update to the thread and the compilation still failing I'm
marking this as returned with feedback. Please feel free to resubmit
to the next CF when there is a new version of the patch.

Regards,
Vignesh

#11

sawada.mshk@gmail.com

about 2 years ago

In reply to: vignesh C (#10)

Re: Skip collecting decoded changes of already-aborted transactions

On Fri, Feb 2, 2024 at 12:48 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 3 Oct 2023 at 15:54, vignesh C <vignesh21@gmail.com> wrote:

On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

In logical decoding, we don't need to collect decoded changes of
aborted transactions. While streaming changes, we can detect
concurrent abort of the (sub)transaction but there is no mechanism to
skip decoding changes of transactions that are known to already be
aborted. With the attached WIP patch, we check CLOG when decoding the
transaction for the first time. If it's already known to be aborted,
we skip collecting decoded changes of such transactions. That way,
when the logical replication is behind or restarts, we don't need to
decode large transactions that already aborted, which helps improve
the decoding performance.

+1 for the idea of checking the transaction status only when we need
to flush it to the disk or send it downstream (if streaming in
progress is enabled). Although this check is costly since we are
planning only for large transactions then it is worth it if we can
occasionally avoid disk or network I/O for the aborted transactions.

Thanks.

I've attached the updated patch. With this patch, we check the
transaction status for only large-transactions when eviction. For
regression test purposes, I disable this transaction status check when
logical_replication_mode is set to 'immediate'.

May be there is some changes that are missing in the patch, which is
giving the following errors:
reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
(first use in this function)
3584 | if (unlikely(logical_replication_mode ==
LOGICAL_REP_MODE_IMMEDIATE))
| ^~~~~~~~~~~~~~~~~~~~~~~~

With no update to the thread and the compilation still failing I'm
marking this as returned with feedback. Please feel free to resubmit
to the next CF when there is a new version of the patch.

I resumed working on this item. I've attached the new version patch.

I rebased the patch to the current HEAD and updated comments and
commit messages. The patch is straightforward and I'm somewhat
satisfied with it, but I'm thinking of adding some tests for it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#12

Ajin Cherian

itsajin@gmail.com

about 2 years ago

In reply to: Masahiko Sawada (#11)

Re: Skip collecting decoded changes of already-aborted transactions

On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

I resumed working on this item. I've attached the new version patch.

I rebased the patch to the current HEAD and updated comments and
commit messages. The patch is straightforward and I'm somewhat
satisfied with it, but I'm thinking of adding some tests for it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

I just had a look at the patch, the patch no longer applies because of a
removal of a header in a recent commit. Overall the patch looks fine, and I
didn't find any issues. Some cosmetic comments:
in ReorderBufferCheckTXNAbort()
+ /* Quick return if we've already knew the transaction status */
+ if (txn->aborted)
+ return true;

knew/know

/*
+ * If logical_replication_mode is "immediate", we don't check the
+ * transaction status so the caller always process this transaction.
+ */
+ if (debug_logical_replication_streaming ==
DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
+ return false;

/process/processes

regards,
Ajin Cherian
Fujitsu Australia

#13

sawada.mshk@gmail.com

about 2 years ago

In reply to: Ajin Cherian (#12)

Re: Skip collecting decoded changes of already-aborted transactions

On Fri, Mar 15, 2024 at 1:21 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I resumed working on this item. I've attached the new version patch.

I rebased the patch to the current HEAD and updated comments and
commit messages. The patch is straightforward and I'm somewhat
satisfied with it, but I'm thinking of adding some tests for it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

I just had a look at the patch, the patch no longer applies because of a removal of a header in a recent commit. Overall the patch looks fine, and I didn't find any issues. Some cosmetic comments:

Thank you for your review comments.

in ReorderBufferCheckTXNAbort()
+ /* Quick return if we've already knew the transaction status */
+ if (txn->aborted)
+ return true;

knew/know

Maybe it should be "known"?

/*
+ * If logical_replication_mode is "immediate", we don't check the
+ * transaction status so the caller always process this transaction.
+ */
+ if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
+ return false;

/process/processes

Fixed.

In addition to these changes, I've made some changes to the latest
patch. Here is the summary:

- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#14

Ajin Cherian

itsajin@gmail.com

about 2 years ago

In reply to: Masahiko Sawada (#13)

Re: Skip collecting decoded changes of already-aborted transactions

On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

In addition to these changes, I've made some changes to the latest
patch. Here is the summary:

- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.

Regards,

Hi Sawada-san,

Thanks for the updated patch. Some comments:

1.
+ * already aborted, we discards all changes accumulated so far and ignore
+ * future changes, and return true. Otherwise return false.

we discards/we discard

2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I
wonder how prepared transactions would be considered, they are neither
committed, nor in progress.

regards,
Ajin Cherian
Fujitsu Australia

#15

sawada.mshk@gmail.com

about 2 years ago

In reply to: Ajin Cherian (#14)

Re: Skip collecting decoded changes of already-aborted transactions

On Wed, Mar 27, 2024 at 8:49 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In addition to these changes, I've made some changes to the latest
patch. Here is the summary:

- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.

Regards,

Hi Sawada-san,

Thanks for the updated patch. Some comments:

Thank you for the view comments!

1.
+ * already aborted, we discards all changes accumulated so far and ignore
+ * future changes, and return true. Otherwise return false.

we discards/we discard

Will fix it.

2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be considered, they are neither committed, nor in progress.

The transaction that is prepared but not resolved yet is considered as
in-progress.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#16

smithpb2250@gmail.com

almost 2 years ago

In reply to: Masahiko Sawada (#13)

Re: Skip collecting decoded changes of already-aborted transactions

Hi, here are some review comments for your patch v4-0001.

======
contrib/test_decoding/sql/stats.sql

1.
Huh? The test fails because the "expected results" file for these new
tests is missing from the patch.

======
.../replication/logical/reorderbuffer.c

2.
 static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
- bool txn_prepared);
+ bool txn_prepared, bool mark_streamed);

IIUC this new 'mark_streamed' parameter is more like a prerequisite
for the other conditions to decide to mark the tx as streamed -- i.e.
it is more like 'can_mark_streamed', so I felt the name should be
changed to be like that (everywhere it is used).

~~~

3. ReorderBufferTruncateTXN

- * 'txn_prepared' indicates that we have decoded the transaction at prepare
- * time.
+ * If mark_streamed is true, we could mark the transaction as streamed.
+ *
+ * 'streaming_txn' indicates that the given transaction is a
streaming transaction.
  */
 static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared,
+ bool mark_streamed)

What's that new comment about 'streaming_txn' for? It seemed unrelated
to the patch code.

~~~

4.
/*
* Mark the transaction as streamed.
*
* The top-level transaction, is marked as streamed always, even if it
* does not contain any changes (that is, when all the changes are in
* subtransactions).
*
* For subtransactions, we only mark them as streamed when there are
* changes in them.
*
* We do it this way because of aborts - we don't want to send aborts for
* XIDs the downstream is not aware of. And of course, it always knows
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
if (mark_streamed && (!txn_prepared) &&
(rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;

With the patch introduction of the new parameter, I felt this code
might be better if it was refactored as follows:

/* Mark the transaction as streamed, if appropriate. */
if (can_mark_streamed)
{
/*
... large comment
*/
if ((!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
}

~~~

5. ReorderBufferPrepare

- if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
+ if (!txn_aborted && rbtxn_did_abort(txn) && !rbtxn_is_streamed(txn))
  rb->prepare(rb, txn, txn->final_lsn);

Maybe I misunderstood this logic, but won't a "concurrent abort" cause
your new Assert added in ReorderBufferProcessTXN to fail?

+ /* Update transaction status */
+ Assert((curtxn->txn_flags & (RBTXN_COMMITTED | RBTXN_ABORTED)) == 0);

~~~

6. ReorderBufferCheckTXNAbort

+ /* Check the transaction status using CLOG lookup */
+ if (TransactionIdIsInProgress(txn->xid))
+ return false;
+
+ if (TransactionIdDidCommit(txn->xid))
+ {
+ /*
+ * Remember the transaction is committed so that we can skip CLOG
+ * check next time, avoiding the pressure on CLOG lookup.
+ */
+ txn->txn_flags |= RBTXN_COMMITTED;
+ return false;
+ }

IIUC the purpose of the TransactionIdDidCommit() was to avoid the
overhead of calling the TransactionIdIsInProgress(). So, shouldn't the
order of these checks be swapped? Otherwise, there might be 1 extra
unnecessary call to TransactionIdIsInProgress() next time.

======
src/include/replication/reorderbuffer.h

7.
#define RBTXN_PREPARE 0x0040
#define RBTXN_SKIPPED_PREPARE 0x0080
#define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
+#define RBTXN_COMMITTED 0x0200
+#define RBTXN_ABORTED 0x0400

For consistency with the existing bitmask names, I guess these should be named:
- RBTXN_COMMITTED --> RBTXN_IS_COMMITTED
- RBTXN_ABORTED --> RBTXN_IS_ABORTED

~~~

8.
Similarly, IMO the macros should have the same names as the bitmasks,
like the other nearby ones generally seem to.

rbtxn_did_commit --> rbtxn_is_committed
rbtxn_did_abort --> rbtxn_is_aborted

======

9.
Also, attached is a top-up patch for other cosmetic nitpicks:
- comment wording
- typos in comments
- excessive or missing blank lines
- etc.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#17

sawada.mshk@gmail.com

over 1 year ago

In reply to: Peter Smith (#16)

Re: Skip collecting decoded changes of already-aborted transactions

Sorry for the late reply.

On Tue, Jun 11, 2024 at 7:41 PM Peter Smith <smithpb2250@gmail.com> wrote:

Hi, here are some review comments for your patch v4-0001.

Thank you for reviewing the patch!

======
contrib/test_decoding/sql/stats.sql

1.
Huh? The test fails because the "expected results" file for these new
tests is missing from the patch.

Fixed.

======
.../replication/logical/reorderbuffer.c
2.
static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
- bool txn_prepared);
+ bool txn_prepared, bool mark_streamed);
IIUC this new 'mark_streamed' parameter is more like a prerequisite
for the other conditions to decide to mark the tx as streamed -- i.e.
it is more like 'can_mark_streamed', so I felt the name should be
changed to be like that (everywhere it is used).

Agreed. I think 'txn_streaming' sounds better and consistent with
'txn_prepared'.

~~~

3. ReorderBufferTruncateTXN

- * 'txn_prepared' indicates that we have decoded the transaction at prepare
- * time.
+ * If mark_streamed is true, we could mark the transaction as streamed.
+ *
+ * 'streaming_txn' indicates that the given transaction is a
streaming transaction.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared,
+ bool mark_streamed)

What's that new comment about 'streaming_txn' for? It seemed unrelated
to the patch code.

Removed.

~~~

4.
/*
* Mark the transaction as streamed.
*
* The top-level transaction, is marked as streamed always, even if it
* does not contain any changes (that is, when all the changes are in
* subtransactions).
*
* For subtransactions, we only mark them as streamed when there are
* changes in them.
*
* We do it this way because of aborts - we don't want to send aborts for
* XIDs the downstream is not aware of. And of course, it always knows
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
if (mark_streamed && (!txn_prepared) &&
(rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;

~~

With the patch introduction of the new parameter, I felt this code
might be better if it was refactored as follows:

/* Mark the transaction as streamed, if appropriate. */
if (can_mark_streamed)
{
/*
... large comment
*/
if ((!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
}

I think we don't necessarily need to make nested if statements just
for comments.

~~~

5. ReorderBufferPrepare
- if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
+ if (!txn_aborted && rbtxn_did_abort(txn) && !rbtxn_is_streamed(txn))
rb->prepare(rb, txn, txn->final_lsn);
~

Maybe I misunderstood this logic, but won't a "concurrent abort" cause
your new Assert added in ReorderBufferProcessTXN to fail?
+ /* Update transaction status */
+ Assert((curtxn->txn_flags & (RBTXN_COMMITTED | RBTXN_ABORTED)) == 0);

I changed txn_flags checks, which should cover your concerns.

~~~

6. ReorderBufferCheckTXNAbort
+ /* Check the transaction status using CLOG lookup */
+ if (TransactionIdIsInProgress(txn->xid))
+ return false;
+
+ if (TransactionIdDidCommit(txn->xid))
+ {
+ /*
+ * Remember the transaction is committed so that we can skip CLOG
+ * check next time, avoiding the pressure on CLOG lookup.
+ */
+ txn->txn_flags |= RBTXN_COMMITTED;
+ return false;
+ }
IIUC the purpose of the TransactionIdDidCommit() was to avoid the
overhead of calling the TransactionIdIsInProgress(). So, shouldn't the
order of these checks be swapped? Otherwise, there might be 1 extra
unnecessary call to TransactionIdIsInProgress() next time.

I'm not sure I understand your comment. IIUC we should use
TransactionIdDidCommit() with a preceding TransactionIdIsInProgress()
check. Also I think once we found the transaction is committed, we no
longer check the transaction status on CLOG nor call
TransactionIdIsInProgress().

======
src/include/replication/reorderbuffer.h

7.
#define RBTXN_PREPARE 0x0040
#define RBTXN_SKIPPED_PREPARE 0x0080
#define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
+#define RBTXN_COMMITTED 0x0200
+#define RBTXN_ABORTED 0x0400

For consistency with the existing bitmask names, I guess these should be named:
- RBTXN_COMMITTED --> RBTXN_IS_COMMITTED
- RBTXN_ABORTED --> RBTXN_IS_ABORTED

Agreed and changed.

~~~

8.
Similarly, IMO the macros should have the same names as the bitmasks,
like the other nearby ones generally seem to.

rbtxn_did_commit --> rbtxn_is_committed
rbtxn_did_abort --> rbtxn_is_aborted

Changed.

======

9.
Also, attached is a top-up patch for other cosmetic nitpicks:
- comment wording
- typos in comments
- excessive or missing blank lines
- etc.

Applied your patch.

I've attached the updated patch. Will register it for the next commit fest.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#18

[1]: /messages/by-id/CAD21AoDJE-bLdxt9T_z1rw74RN=E0n0+esYU0eo+-_P32EbuVg@mail.gmail.com

sawada.mshk@gmail.com

over 1 year ago

In reply to: Ajin Cherian (#14)

Re: Skip collecting decoded changes of already-aborted transactions

On Wed, Mar 27, 2024 at 4:49 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In addition to these changes, I've made some changes to the latest
patch. Here is the summary:

- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.

Regards,

Hi Sawada-san,

Thanks for the updated patch. Some comments:
1.
+ * already aborted, we discards all changes accumulated so far and ignore
+ * future changes, and return true. Otherwise return false.
we discards/we discard

This comment is incorporated into the latest v5 patch I've just sent[1]/messages/by-id/CAD21AoDJE-bLdxt9T_z1rw74RN=E0n0+esYU0eo+-_P32EbuVg@mail.gmail.com.

2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be considered, they are neither committed, nor in progress.

IIUC prepared transactions are considered as in-progress.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#19

smithpb2250@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#18)

Re: Skip collecting decoded changes of already-aborted transactions

Hi Sawada-San, here are some review comments for the patch v5-0001.

======
Commit message.

1.
This commit introduces an additional check to determine if a
transaction is already aborted by a CLOG lookup, so the logical
decoding skips further change also when it doesn't touch system
catalogs.

Is that wording backwards? Is it meant to say:

This commit introduces an additional CLOG lookup check to determine if
a transaction is already aborted, so the ...

======
contrib/test_decoding/sql/stats.sql

2
+SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS
spill_count FROM pg_stat_replication_slots WHERE slot_name =
'regression_slot_stats4_twophase';

Why do the SELECT "= 0" like this, instead of just having zeros in the
"expected" results?

======
.../replication/logical/reorderbuffer.c

3.
 static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
- bool txn_prepared);
+ bool txn_prepared, bool mark_streamed);

That last parameter name ('mark_streamed') does not match the same
parameter name in this function's definition.

~~~

ReorderBufferTruncateTXN:

4.
if (txn_streaming && (!txn_prepared) &&
(rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;

if (txn_prepared)
{
~

Since the following condition was already "if (txn_prepared)" would it
be better remove the "(!txn_prepared)" here and instead just refactor
the code like:

if (txn_prepared)
{
...
}
else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
{
...
}

~~~

ReorderBufferProcessTXN:

5.
+
+ /* Remember the transaction is aborted */
+ Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
+ curtxn->txn_flags |= RBTXN_IS_ABORTED;

Missing period on comment.

~~~

ReorderBufferCheckTXNAbort:

6.
+ * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
+ * check the transaction status, so the caller always processes this
+ * transaction. This is to disable this check for regression tests.
+ */
+static bool
+ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ /*
+ * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
+ * check the transaction status, so the caller always processes this
+ * transaction.
+ */
+ if (unlikely(debug_logical_replication_streaming ==
DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE))
+ return false;
+

The wording of the sentence "This is to disable..." seemed a bit
confusing. Maybe this area can be simplified by doing the following.

6a.
Change the function comment to say more like below:

When the GUC 'debug_logical_replication_streaming' is set to
"immediate", we don't check the transaction status, meaning the caller
will always process this transaction. This mode is used by regression
tests to avoid unnecessary transaction status checking.

6b.
It is not necessary for this 2nd comment to repeat everything that was
already said in the function comment. A simpler comment here might be
all you need:

SUGGESTION:
Quick return for regression tests.

~~~

7.
Is it worth mentioning about this skipping of the transaction status
check in the docs for this GUC? [1]https://www.postgresql.org/docs/devel/runtime-config-developer.html

======
[1]: https://www.postgresql.org/docs/devel/runtime-config-developer.html

Kind Regards,
Peter Smith.
Fujitsu Australia.

#20

sawada.mshk@gmail.com

over 1 year ago

In reply to: Peter Smith (#19)

Re: Skip collecting decoded changes of already-aborted transactions

On Sun, Nov 10, 2024 at 11:24 PM Peter Smith <smithpb2250@gmail.com> wrote:

Hi Sawada-San, here are some review comments for the patch v5-0001.

Thank you for reviewing the patch!

======
Commit message.

1.
This commit introduces an additional check to determine if a
transaction is already aborted by a CLOG lookup, so the logical
decoding skips further change also when it doesn't touch system
catalogs.

~

Is that wording backwards? Is it meant to say:

This commit introduces an additional CLOG lookup check to determine if
a transaction is already aborted, so the ...

Fixed.

======
contrib/test_decoding/sql/stats.sql

2
+SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS
spill_count FROM pg_stat_replication_slots WHERE slot_name =
'regression_slot_stats4_twophase';

Why do the SELECT "= 0" like this, instead of just having zeros in the
"expected" results?

Indeed. I used "=0" like other queries in the same file do, but it
makes sense to me just to have zeros in the expected file. That way,
it would make it a bit easier to investigate in case of failures.

======
.../replication/logical/reorderbuffer.c
3.
static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
- bool txn_prepared);
+ bool txn_prepared, bool mark_streamed);
That last parameter name ('mark_streamed') does not match the same
parameter name in this function's definition.

Fixed.

~~~

ReorderBufferTruncateTXN:

4.
if (txn_streaming && (!txn_prepared) &&
(rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;

if (txn_prepared)
{
~

Since the following condition was already "if (txn_prepared)" would it
be better remove the "(!txn_prepared)" here and instead just refactor
the code like:

if (txn_prepared)
{
...
}
else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
{
...
}

Good idea.

~~~

ReorderBufferProcessTXN:

5.
+
+ /* Remember the transaction is aborted */
+ Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
+ curtxn->txn_flags |= RBTXN_IS_ABORTED;

Missing period on comment.

Fixed.

~~~

ReorderBufferCheckTXNAbort:
6.
+ * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
+ * check the transaction status, so the caller always processes this
+ * transaction. This is to disable this check for regression tests.
+ */
+static bool
+ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ /*
+ * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
+ * check the transaction status, so the caller always processes this
+ * transaction.
+ */
+ if (unlikely(debug_logical_replication_streaming ==
DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE))
+ return false;
+
The wording of the sentence "This is to disable..." seemed a bit
confusing. Maybe this area can be simplified by doing the following.

6a.
Change the function comment to say more like below:

When the GUC 'debug_logical_replication_streaming' is set to
"immediate", we don't check the transaction status, meaning the caller
will always process this transaction. This mode is used by regression
tests to avoid unnecessary transaction status checking.

~

6b.
It is not necessary for this 2nd comment to repeat everything that was
already said in the function comment. A simpler comment here might be
all you need:

SUGGESTION:
Quick return for regression tests.

Agreed with the above two comments. Fixed.

~~~

7.
Is it worth mentioning about this skipping of the transaction status
check in the docs for this GUC? [1]

If we want to mention this optimization in the docs, we have to
explain how the optimization works too. I think it's too detailed.

I've attached the updated patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#21

smithpb2250@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#20)

#22

vignesh21@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#20)

#23

sawada.mshk@gmail.com

over 1 year ago

In reply to: Peter Smith (#21)

#24

sawada.mshk@gmail.com

over 1 year ago

In reply to: vignesh C (#22)

#25

smithpb2250@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#23)

#26

sawada.mshk@gmail.com

over 1 year ago

In reply to: Peter Smith (#25)

#27

smithpb2250@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#26)

#28

sawada.mshk@gmail.com

over 1 year ago

In reply to: Peter Smith (#27)

#29

vignesh21@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#28)

#30

smithpb2250@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#28)

#31

sawada.mshk@gmail.com

over 1 year ago

In reply to: vignesh C (#29)

#32

sawada.mshk@gmail.com

over 1 year ago

In reply to: Peter Smith (#30)

#33

vignesh21@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#31)

#34

sawada.mshk@gmail.com

over 1 year ago

In reply to: vignesh C (#33)

#35

amit.kapila16@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#32)

#36

dilipbalaut@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#32)

#37

amit.kapila16@gmail.com

over 1 year ago

In reply to: Dilip Kumar (#36)

#38

amit.kapila16@gmail.com

over 1 year ago

In reply to: Amit Kapila (#35)

#39

dilipbalaut@gmail.com

over 1 year ago

In reply to: Amit Kapila (#37)

#40

sawada.mshk@gmail.com

over 1 year ago

In reply to: Dilip Kumar (#39)

#41

dilipbalaut@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#40)

#42

amit.kapila16@gmail.com

over 1 year ago

In reply to: Dilip Kumar (#41)

#43

dilipbalaut@gmail.com

over 1 year ago

In reply to: Amit Kapila (#42)

#44

sawada.mshk@gmail.com

over 1 year ago

In reply to: Dilip Kumar (#43)

#45

dilipbalaut@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#44)

#46

amit.kapila16@gmail.com

over 1 year ago

In reply to: Dilip Kumar (#45)

#47

sawada.mshk@gmail.com

over 1 year ago

In reply to: Amit Kapila (#35)

#48

amit.kapila16@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#47)

#49

sawada.mshk@gmail.com

over 1 year ago

In reply to: Amit Kapila (#48)

#50

amit.kapila16@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#49)

#51

sawada.mshk@gmail.com

over 1 year ago

In reply to: Amit Kapila (#50)

#52

smithpb2250@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#51)

#53

amit.kapila16@gmail.com

over 1 year ago

In reply to: Peter Smith (#52)

#54

sawada.mshk@gmail.com

over 1 year ago

In reply to: Amit Kapila (#53)

#55

sawada.mshk@gmail.com

over 1 year ago

In reply to: Peter Smith (#52)

#56

smithpb2250@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#55)

#57

smithpb2250@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#55)

#58

amit.kapila16@gmail.com

over 1 year ago

In reply to: Peter Smith (#57)

#59

sawada.mshk@gmail.com

over 1 year ago

In reply to: Peter Smith (#56)

#60

sawada.mshk@gmail.com

over 1 year ago

In reply to: Amit Kapila (#58)

#61

amit.kapila16@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#60)

#62

smithpb2250@gmail.com

over 1 year ago

In reply to: Amit Kapila (#61)

#63

sawada.mshk@gmail.com

over 1 year ago

In reply to: Peter Smith (#62)

#64

amit.kapila16@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#63)

#65

sawada.mshk@gmail.com

over 1 year ago

In reply to: Amit Kapila (#64)

#66

smithpb2250@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#65)

#67

amit.kapila16@gmail.com

over 1 year ago

In reply to: Peter Smith (#66)

#68

smithpb2250@gmail.com

over 1 year ago

In reply to: Amit Kapila (#67)

#69

sawada.mshk@gmail.com

over 1 year ago

In reply to: Peter Smith (#68)

#70

amit.kapila16@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#69)

#71

sawada.mshk@gmail.com

about 1 year ago

In reply to: Amit Kapila (#70)

#72

smithpb2250@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#71)

#73

sawada.mshk@gmail.com

about 1 year ago

In reply to: Peter Smith (#72)

#74

smithpb2250@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#73)

#75

smithpb2250@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#73)

#76

sawada.mshk@gmail.com

about 1 year ago

In reply to: Peter Smith (#74)

#77

smithpb2250@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#76)

#78

sawada.mshk@gmail.com

about 1 year ago

In reply to: Peter Smith (#75)

#79

sawada.mshk@gmail.com

about 1 year ago

In reply to: Peter Smith (#77)

#80

sawada.mshk@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#78)

#81

smithpb2250@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#78)

#82

sawada.mshk@gmail.com

about 1 year ago

In reply to: Peter Smith (#81)

#83

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 1 year ago

In reply to: Masahiko Sawada (#82)

#84