[PATCH] Present all committed transaction to the output plugin

Started by Markus Wanneralmost 5 years ago8 messages
#1Markus Wanner
markus.wanner@enterprisedb.com
1 attachment(s)

Hi,

attached is a patch that I think is cleaning up the API between Postgres
and the logical decoding plugin. Up until now, not only transactions
rolled back, but also some committed transactions were filtered and not
presented to the output plugin. While it is documented that aborted
transactions are not decoded, the second exception has not been documented.

The difference is with committed empty transactions that have a snapshot
versus those that do not. I think that's arbitrary and propose to
remove this distinction, so that all committed transactions are decoded.

In the case of decoding a two-phase transaction, I argue that this is
even more important, as the gid potentially carries information.

Please consider the attached patch, which drops the mentioned filter.
It also adjusts tests to show the difference and provides a minor
clarification to the documentation.

Regards

Markus

Attachments:

0001-Present-committed-transactions-to-output-plugin.patchtext/x-patch; charset=UTF-8; name=0001-Present-committed-transactions-to-output-plugin.patchDownload
From: Markus Wanner <markus.wanner@enterprisedb.com>
Date: Fri, 19 Feb 2021 13:34:22 +0100
Subject: Present all committed transactions to the output plugin

Drop the filtering of some empty transactions and more
consistently present all committed transactions to the output
plugin.  Changes behavior for empty transactions that do not even
have a base_snapshot.

While clearly not making any substantial changes to the database,
an empty transactions still increments the local TransactionId
and possibly carries a gid (in the case of two-phase
transactions).

Adjust the test_decoder to show the difference between empty
transactions that do not carry a snapshot and those that
do. Adjust tests to cover decoding of committed empty
transactions with and without snapshots.

Add a minor clarification in the docs.
---
 .../replication/logical/reorderbuffer.c       | 41 ++++++++-----------
 doc/src/sgml/logicaldecoding.sgml             | 10 +++--
 contrib/test_decoding/expected/mxact.out      |  2 +-
 .../test_decoding/expected/ondisk_startup.out |  2 +-
 contrib/test_decoding/expected/prepared.out   | 33 +++++++++++++--
 contrib/test_decoding/expected/xact.out       | 39 ++++++++++++++++++
 contrib/test_decoding/sql/prepared.sql        | 23 +++++++++--
 contrib/test_decoding/sql/xact.sql            | 21 ++++++++++
 contrib/test_decoding/test_decoding.c         | 12 ++++++
 9 files changed, 147 insertions(+), 36 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5a62ab8bbc1..1e9ed81c3c1 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2016,7 +2016,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	ReorderBufferBuildTupleCidHash(rb, txn);
 
 	/* setup the initial snapshot */
-	SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
+	if (snapshot_now)
+		SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
 
 	/*
 	 * Decoding needs access to syscaches et al., which in turn use
@@ -2289,6 +2290,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					break;
 
 				case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+					Assert(snapshot_now);
+
 					/* get rid of the old */
 					TeardownHistoricSnapshot(false);
 
@@ -2321,6 +2324,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					break;
 
 				case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+					Assert(snapshot_now);
 					Assert(change->data.command_id != InvalidCommandId);
 
 					if (command_id < change->data.command_id)
@@ -2397,8 +2401,11 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		 * streaming mode.
 		 */
 		if (streaming)
+		{
+			Assert(snapshot_now);
 			ReorderBufferSaveTXNSnapshot(rb, txn, snapshot_now, command_id);
-		else if (snapshot_now->copied)
+		}
+		else if (snapshot_now && snapshot_now->copied)
 			ReorderBufferFreeSnap(rb, snapshot_now);
 
 		/* cleanup */
@@ -2520,7 +2527,6 @@ ReorderBufferReplay(ReorderBufferTXN *txn,
 					TimestampTz commit_time,
 					RepOriginId origin_id, XLogRecPtr origin_lsn)
 {
-	Snapshot	snapshot_now;
 	CommandId	command_id = FirstCommandId;
 
 	txn->final_lsn = commit_lsn;
@@ -2544,28 +2550,15 @@ ReorderBufferReplay(ReorderBufferTXN *txn,
 	}
 
 	/*
-	 * If this transaction has no snapshot, it didn't make any changes to the
-	 * database, so there's nothing to decode.  Note that
-	 * ReorderBufferCommitChild will have transferred any snapshots from
-	 * subtransactions if there were any.
+	 * Process and send the changes to output plugin.
+	 *
+	 * Note that for empty transactions, txn->base_snapshot may well be
+	 * NULL. The corresponding callbacks will still be invoked, as even an
+	 * empty transaction carries information (LSN increments, the gid in
+	 * case of a two-phase transaction).  This is unlike versions prior to
+	 * 14 which optimized away transactions without a snapshot.
 	 */
-	if (txn->base_snapshot == NULL)
-	{
-		Assert(txn->ninvalidations == 0);
-
-		/*
-		 * Removing this txn before a commit might result in the computation
-		 * of an incorrect restart_lsn. See SnapBuildProcessRunningXacts.
-		 */
-		if (!rbtxn_prepared(txn))
-			ReorderBufferCleanupTXN(rb, txn);
-		return;
-	}
-
-	snapshot_now = txn->base_snapshot;
-
-	/* Process and send the changes to output plugin. */
-	ReorderBufferProcessTXN(rb, txn, commit_lsn, snapshot_now,
+	ReorderBufferProcessTXN(rb, txn, commit_lsn, txn->base_snapshot,
 							command_id, false);
 }
 
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index cf705ed9cda..832acee8270 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -540,9 +540,8 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
      Concurrent transactions are decoded in commit order, and only changes
      belonging to a specific transaction are decoded between
      the <literal>begin</literal> and <literal>commit</literal>
-     callbacks. Transactions that were rolled back explicitly or implicitly
-     never get
-     decoded. Successful savepoints are
+     callbacks. All transactions are presented to the output plugin, whether
+     they made changes or not. Successful savepoints are
      folded into the transaction containing them in the order they were
      executed within that transaction. A transaction that is prepared for
      a two-phase commit using <command>PREPARE TRANSACTION</command> will
@@ -632,7 +631,10 @@ typedef void (*LogicalDecodeBeginCB) (struct LogicalDecodingContext *ctx,
 </programlisting>
       The <parameter>txn</parameter> parameter contains meta information about
       the transaction, like the time stamp at which it has been committed and
-      its XID.
+      its XID. A transaction without a snapshot in
+      <literal>txn-&gt;base_snapshot</literal> will certainly not contain any
+      changes.  Presence of a snapshot does not guarantee that a change will
+      follow prior to the commit, though.
      </para>
     </sect3>
 
diff --git a/contrib/test_decoding/expected/mxact.out b/contrib/test_decoding/expected/mxact.out
index f0d96cc67d0..9a3e3f21bc7 100644
--- a/contrib/test_decoding/expected/mxact.out
+++ b/contrib/test_decoding/expected/mxact.out
@@ -55,7 +55,7 @@ step s0start: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NUL
 data           
 
 BEGIN          
-COMMIT         
+COMMIT (empty with snapshot)
 BEGIN          
 table public.do_write: INSERT: id[integer]:1 ts[timestamp with time zone]:null
 COMMIT         
diff --git a/contrib/test_decoding/expected/ondisk_startup.out b/contrib/test_decoding/expected/ondisk_startup.out
index 586b03d75db..d390d90fd26 100644
--- a/contrib/test_decoding/expected/ondisk_startup.out
+++ b/contrib/test_decoding/expected/ondisk_startup.out
@@ -43,7 +43,7 @@ BEGIN
 table public.do_write: INSERT: id[integer]:2 addedbys2[integer]:null
 COMMIT         
 BEGIN          
-COMMIT         
+COMMIT (empty with snapshot)
 BEGIN          
 table public.do_write: INSERT: id[integer]:3 addedbys2[integer]:null addedbys1[integer]:null
 COMMIT         
diff --git a/contrib/test_decoding/expected/prepared.out b/contrib/test_decoding/expected/prepared.out
index 46e915d4ffa..6656d132069 100644
--- a/contrib/test_decoding/expected/prepared.out
+++ b/contrib/test_decoding/expected/prepared.out
@@ -35,9 +35,6 @@ COMMIT PREPARED 'test_prepared#3';
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (8);
 INSERT INTO test_prepared2 VALUES (9);
--- cleanup
-DROP TABLE test_prepared1;
-DROP TABLE test_prepared2;
 -- show results
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                   data                                   
@@ -66,6 +63,36 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
  COMMIT
 (22 rows)
 
+-- test decoding of an empty two-phase transaction
+BEGIN;
+SELECT 1 FROM test_prepared1 FOR UPDATE LIMIT 1;
+ ?column? 
+----------
+        1
+(1 row)
+
+PREPARE TRANSACTION 'test_prepared#4';
+COMMIT PREPARED 'test_prepared#4';
+-- test decoding of an empty transaction for which the reorderbuffer
+-- had to create a snapshot
+BEGIN;
+SAVEPOINT barf;
+INSERT INTO test_prepared1 VALUES (10);
+ROLLBACK TO SAVEPOINT barf;
+PREPARE TRANSACTION 'test_prepared#5';
+COMMIT PREPARED 'test_prepared#5';
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '0');
+             data             
+------------------------------
+ BEGIN (w/o snapshot)
+ COMMIT (empty w/o snapshot)
+ BEGIN
+ COMMIT (empty with snapshot)
+(4 rows)
+
+-- cleanup
+DROP TABLE test_prepared1;
+DROP TABLE test_prepared2;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
 --------------------------
diff --git a/contrib/test_decoding/expected/xact.out b/contrib/test_decoding/expected/xact.out
index ec4745005d7..3b50f8d073b 100644
--- a/contrib/test_decoding/expected/xact.out
+++ b/contrib/test_decoding/expected/xact.out
@@ -55,6 +55,45 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
  COMMIT
 (3 rows)
 
+-- test decoding of an aborted transaction
+BEGIN;
+INSERT INTO xact_test VALUES ('aborted-txn');
+ROLLBACK;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '0');
+ data 
+------
+(0 rows)
+
+-- test decoding of an empty but not aborted transaction
+BEGIN;
+SELECT 1 FROM xact_test FOR UPDATE LIMIT 1;
+ ?column? 
+----------
+        1
+(1 row)
+
+COMMIT;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '0');
+            data             
+-----------------------------
+ BEGIN (w/o snapshot)
+ COMMIT (empty w/o snapshot)
+(2 rows)
+
+-- test decoding of an empty transaction for which the reorderbuffer
+-- had to create a snapshot
+BEGIN;
+SAVEPOINT barf;
+INSERT INTO xact_test VALUES ('change-to-roll-back');
+ROLLBACK TO SAVEPOINT barf;
+COMMIT;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '0');
+             data             
+------------------------------
+ BEGIN
+ COMMIT (empty with snapshot)
+(2 rows)
+
 DROP TABLE xact_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/prepared.sql b/contrib/test_decoding/sql/prepared.sql
index e72639767ee..dcb18ecc80d 100644
--- a/contrib/test_decoding/sql/prepared.sql
+++ b/contrib/test_decoding/sql/prepared.sql
@@ -40,11 +40,28 @@ COMMIT PREPARED 'test_prepared#3';
 INSERT INTO test_prepared1 VALUES (8);
 INSERT INTO test_prepared2 VALUES (9);
 
+-- show results
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
+-- test decoding of an empty two-phase transaction
+BEGIN;
+SELECT 1 FROM test_prepared1 FOR UPDATE LIMIT 1;
+PREPARE TRANSACTION 'test_prepared#4';
+COMMIT PREPARED 'test_prepared#4';
+
+-- test decoding of an empty transaction for which the reorderbuffer
+-- had to create a snapshot
+BEGIN;
+SAVEPOINT barf;
+INSERT INTO test_prepared1 VALUES (10);
+ROLLBACK TO SAVEPOINT barf;
+PREPARE TRANSACTION 'test_prepared#5';
+COMMIT PREPARED 'test_prepared#5';
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '0');
+
 -- cleanup
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 
--- show results
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
-
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/sql/xact.sql b/contrib/test_decoding/sql/xact.sql
index aa555911e86..419b26f9b38 100644
--- a/contrib/test_decoding/sql/xact.sql
+++ b/contrib/test_decoding/sql/xact.sql
@@ -28,6 +28,27 @@ COMMIT;
 -- and now show those changes
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
+-- test decoding of an aborted transaction
+BEGIN;
+INSERT INTO xact_test VALUES ('aborted-txn');
+ROLLBACK;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '0');
+
+-- test decoding of an empty but not aborted transaction
+BEGIN;
+SELECT 1 FROM xact_test FOR UPDATE LIMIT 1;
+COMMIT;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '0');
+
+-- test decoding of an empty transaction for which the reorderbuffer
+-- had to create a snapshot
+BEGIN;
+SAVEPOINT barf;
+INSERT INTO xact_test VALUES ('change-to-roll-back');
+ROLLBACK TO SAVEPOINT barf;
+COMMIT;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '0');
+
 DROP TABLE xact_test;
 
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 929255eac74..c9dabaf60ae 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -324,6 +324,10 @@ pg_output_begin(LogicalDecodingContext *ctx, TestDecodingData *data, ReorderBuff
 		appendStringInfo(ctx->out, "BEGIN %u", txn->xid);
 	else
 		appendStringInfoString(ctx->out, "BEGIN");
+
+	if (txn->base_snapshot == NULL)
+		appendStringInfo(ctx->out, " (w/o snapshot)");
+
 	OutputPluginWrite(ctx, last_write);
 }
 
@@ -348,6 +352,14 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	else
 		appendStringInfoString(ctx->out, "COMMIT");
 
+	if (!xact_wrote_changes)
+	{
+		if (txn->base_snapshot == NULL)
+			appendStringInfo(ctx->out, " (empty w/o snapshot)");
+		else
+			appendStringInfo(ctx->out, " (empty with snapshot)");
+	}
+
 	if (data->include_timestamp)
 		appendStringInfo(ctx->out, " (at %s)",
 						 timestamptz_to_str(txn->commit_time));
-- 
2.30.0

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Markus Wanner (#1)
Re: [PATCH] Present all committed transaction to the output plugin

On Fri, Feb 19, 2021 at 6:06 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

Hi,

attached is a patch that I think is cleaning up the API between Postgres
and the logical decoding plugin. Up until now, not only transactions
rolled back, but also some committed transactions were filtered and not
presented to the output plugin. While it is documented that aborted
transactions are not decoded, the second exception has not been documented.

The difference is with committed empty transactions that have a snapshot
versus those that do not. I think that's arbitrary and propose to
remove this distinction, so that all committed transactions are decoded.

What exactly is the use case to send empty transactions with or
without prepared? In the past, there was a complaint [1]/messages/by-id/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com that such
transactions increase the network traffic.

[1]: /messages/by-id/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com

--
With Regards,
Amit Kapila.

#3Markus Wanner
markus.wanner@enterprisedb.com
In reply to: Amit Kapila (#2)
Re: [PATCH] Present all committed transaction to the output plugin

On 20.02.21 12:15, Amit Kapila wrote:

What exactly is the use case to send empty transactions with or
without prepared?

I'm not saying that output plugins should *send* empty transactions to
the replica. I rather agree that this indeed is not wanted in most cases.

However, that's not what the patch changes. It just moves the decision
to the output plugin, giving it more flexibility. And possibly allowing
it to still take action. For example, in case of a distributed
two-phase commit scenario, where the publisher waits after its local
PREPARE for replicas to also PREPARE. If such a prepare doesn't even
get to the output plugin, that won't work. Not even thinking of a
PREPARE on one node followed by a COMMIT PREPARED from a different node.
It simply is not the business of the decoder to decide what to do with
empty transactions.

Plus, given the decoder does not manage to reliably filter all empty
transactions, an output plugin might want to implement its own
filtering, anyway (point in case: contrib/test_decoding and its
'skip_empty_xacts' option - that actually kind of implies it would be
possible to not skip them - as does the documentation). So I'm rather
wondering: what's the use case of filtering some, but not all empty
transactions (on the decoder side)?

Regards

Markus

#4Andres Freund
andres@anarazel.de
In reply to: Markus Wanner (#3)
Re: [PATCH] Present all committed transaction to the output plugin

Hi,

On 2021-02-20 13:48:49 +0100, Markus Wanner wrote:

However, that's not what the patch changes. It just moves the decision to
the output plugin, giving it more flexibility. And possibly allowing it to
still take action.

It's not free though - there's plenty workloads where there's an xid but
no other WAL records for transactions. Threading those through the
output plugin does increase the runtime cost. And because such
transactions will typically not incur a high cost on the primary
(e.g. in case of unlogged tables, there'll be a commit record, but often
the transaction will not wait for the commit record to be flushed to
disk), increasing the replication overhead isn't great.

For example, in case of a distributed two-phase commit
scenario, where the publisher waits after its local PREPARE for replicas to
also PREPARE.

Why is that ever interesting to do in the case of empty transactions?
Due to the cost of doing remote PREPAREs ISTM you'd always want to
implement the optimization of not doing so for empty transactions.

So I'm rather wondering: what's the use case of filtering some, but
not all empty transactions (on the decoder side)?

I'm wondering the opposite: What's a potential use case for handing
"trivially empty" transactions to the output plugin that's worth
incurring some cost for everyone?

Greetings,

Andres Freund

#5Markus Wanner
markus.wanner@enterprisedb.com
In reply to: Andres Freund (#4)
Re: [PATCH] Present all committed transaction to the output plugin

On 20.02.21 21:08, Andres Freund wrote:

It's not free though

Agreed. It's an additional call to a callback. Do you think that's
acceptable if limited to two-phase transactions only?

I'm wondering the opposite: What's a potential use case for handing
"trivially empty" transactions to the output plugin that's worth
incurring some cost for everyone?

Outlined in my previous mail: prepare the transaction on one node,
commit it on another one. The PREPARE of a transaction is an event a
user may well want to have replicated, without having to worry about
whether or not the transaction happens to be empty.

[ Imagine: ERROR: transaction cannot be replicated because it's empty.
HINT: add a dummy UPDATE so that Postgres always has
something to replicate, whatever else your app does
or does not do in the transaction. ]

Regards

Markus

#6Andres Freund
andres@anarazel.de
In reply to: Markus Wanner (#5)
Re: [PATCH] Present all committed transaction to the output plugin

Hi,

On 2021-02-20 21:44:30 +0100, Markus Wanner wrote:

On 20.02.21 21:08, Andres Freund wrote:

It's not free though

Agreed. It's an additional call to a callback.

If it were just a single indirection function call I'd not be
bothered. But we need to do a fair bit mroe than that
(c.f. ReorderBufferProcessTXN()).

Do you think that's acceptable if limited to two-phase transactions
only?

Cost-wise, yes - a 2pc prepare/commit is expensive enough that
comparatively the replay cost is unlikely to be relevant. Behaviourally
I'm still not convinced it's useful.

I'm wondering the opposite: What's a potential use case for handing
"trivially empty" transactions to the output plugin that's worth
incurring some cost for everyone?

Outlined in my previous mail: prepare the transaction on one node, commit it
on another one. The PREPARE of a transaction is an event a user may well
want to have replicated, without having to worry about whether or not the
transaction happens to be empty.

I read the previous mails in this thread, and I don't really see an
explanation for why this is something actually useful. When is a
transaction without actual contents interesting to replicate? I don't
find the "gid potentially carries information" particularly convincing.

[ Imagine: ERROR: transaction cannot be replicated because it's empty.
HINT: add a dummy UPDATE so that Postgres always has
something to replicate, whatever else your app does
or does not do in the transaction. ]

Meh.

Greetings,

Andres Freund

#7Markus Wanner
markus.wanner@enterprisedb.com
In reply to: Andres Freund (#6)
1 attachment(s)
Re: [PATCH] Present all committed transaction to the output plugin

On 21.02.21 03:04, Andres Freund wrote:

Cost-wise, yes - a 2pc prepare/commit is expensive enough that
comparatively the replay cost is unlikely to be relevant.

Good. I attached an updated patch eliminating only the filtering for
empty two-phase transactions.

Behaviourally I'm still not convinced it's useful.

I don't have any further argument than: If you're promising to replicate
two phases, I expect the first phase to be replicated individually.

A database state with a transaction prepared and identified by
'woohoo-roll-me-back-if-you-can' is not the same as a state without it.
Even if the transaction is empty, or if you're actually going to roll
it back. And therefore possibly ending up at the very same state without
any useful effect.

Regards

Markus

Attachments:

0001-Present-empty-prepares-to-the-output-plugin.patchtext/x-patch; charset=UTF-8; name=0001-Present-empty-prepares-to-the-output-plugin.patchDownload
From: Markus Wanner <markus.wanner@enterprisedb.com>
Date: Sun, 21 Feb 2021 10:43:34 +0100
Subject: [PATCH] Present empty prepares to the output plugin

While clearly not making any substantial changes to the data, a
PREPARE of an empty transaction still carries a gid that acts as a
sentinel for a prepared transaction that may be committed or rolled
back.

Drop the filtering of empty prepared transactions on the Postgres
side.  An output plugin can still filter empty prepares, if it
wishes to do so.
---
 .../replication/logical/reorderbuffer.c       | 45 ++++++++++++-------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5a62ab8bbc1..18195255007 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2016,7 +2016,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	ReorderBufferBuildTupleCidHash(rb, txn);
 
 	/* setup the initial snapshot */
-	SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
+	if (snapshot_now)
+		SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
 
 	/*
 	 * Decoding needs access to syscaches et al., which in turn use
@@ -2289,6 +2290,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					break;
 
 				case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+					Assert(snapshot_now);
+
 					/* get rid of the old */
 					TeardownHistoricSnapshot(false);
 
@@ -2321,6 +2324,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					break;
 
 				case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+					Assert(snapshot_now);
 					Assert(change->data.command_id != InvalidCommandId);
 
 					if (command_id < change->data.command_id)
@@ -2397,8 +2401,11 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		 * streaming mode.
 		 */
 		if (streaming)
+		{
+			Assert(snapshot_now);
 			ReorderBufferSaveTXNSnapshot(rb, txn, snapshot_now, command_id);
-		else if (snapshot_now->copied)
+		}
+		else if (snapshot_now && snapshot_now->copied)
 			ReorderBufferFreeSnap(rb, snapshot_now);
 
 		/* cleanup */
@@ -2520,7 +2527,6 @@ ReorderBufferReplay(ReorderBufferTXN *txn,
 					TimestampTz commit_time,
 					RepOriginId origin_id, XLogRecPtr origin_lsn)
 {
-	Snapshot	snapshot_now;
 	CommandId	command_id = FirstCommandId;
 
 	txn->final_lsn = commit_lsn;
@@ -2547,25 +2553,32 @@ ReorderBufferReplay(ReorderBufferTXN *txn,
 	 * If this transaction has no snapshot, it didn't make any changes to the
 	 * database, so there's nothing to decode.  Note that
 	 * ReorderBufferCommitChild will have transferred any snapshots from
-	 * subtransactions if there were any.
+	 * subtransactions if there were any.  This effectively makes empty
+	 * transactions invisible to the output plugin.
+	 *
+	 * An empty two-phase transaction must not be short-circuited in the
+	 * same way, because it carries a gid which may carry useful
+	 * information, even if it is only a sentinel for an uncommitted empty
+	 * transaction.
 	 */
-	if (txn->base_snapshot == NULL)
+	if (txn->base_snapshot == NULL && !rbtxn_prepared(txn))
 	{
 		Assert(txn->ninvalidations == 0);
-
-		/*
-		 * Removing this txn before a commit might result in the computation
-		 * of an incorrect restart_lsn. See SnapBuildProcessRunningXacts.
-		 */
-		if (!rbtxn_prepared(txn))
-			ReorderBufferCleanupTXN(rb, txn);
+		ReorderBufferCleanupTXN(rb, txn);
 		return;
 	}
 
-	snapshot_now = txn->base_snapshot;
-
-	/* Process and send the changes to output plugin. */
-	ReorderBufferProcessTXN(rb, txn, commit_lsn, snapshot_now,
+	/*
+	 * Process and send the changes to output plugin.
+	 *
+	 * Note that for empty transactions, txn->base_snapshot may well be NULL
+	 * in case of an empty two-phase transaction.  The corresponding
+	 * callbacks will still be invoked, as even an empty transaction carries
+	 * information (LSN increments, the gid in case of a two-phase
+	 * transaction).  This is unlike versions prior to 13 which optimized
+	 * away empty transactions.
+	 */
+	ReorderBufferProcessTXN(rb, txn, commit_lsn, txn->base_snapshot,
 							command_id, false);
 }
 
-- 
2.30.0

#8Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Markus Wanner (#7)
Re: [PATCH] Present all committed transaction to the output plugin

On 2/21/21 11:05 AM, Markus Wanner wrote:

On 21.02.21 03:04, Andres Freund wrote:

Cost-wise, yes - a 2pc prepare/commit is expensive enough that
comparatively the replay cost is unlikely to be relevant.

Good.  I attached an updated patch eliminating only the filtering for
empty two-phase transactions.

Behaviourally I'm still not convinced it's useful.

I don't have any further argument than: If you're promising to replicate
two phases, I expect the first phase to be replicated individually.

A database state with a transaction prepared and identified by
'woohoo-roll-me-back-if-you-can' is not the same as a state without it.
 Even if the transaction is empty, or if you're actually going to roll
it back. And therefore possibly ending up at the very same state without
any useful effect.

IMHO it's quite weird to handle the 2PC and non-2PC cases differently.

If the argument is that this is expensive, it'd be good to quantify
that, somehow. If there's a workload with significant fraction of such
empty transactions, does that mean +1% CPU usage, +10% or more?

Why not to make this configurable, i.e. the output plugin might indicate
whether it's interested in empty transactions or not. If not, we can do
what we do now. Otherwise the empty transactions would be passed to the
output plugin.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company