Single transaction in the tablesync worker?

Started by Amit Kapilaabout 5 years ago190 messages

amit.kapila16@gmail.com

about 5 years ago

1 attachment(s)

The tablesync worker in logical replication performs the table data
sync in a single transaction which means it will copy the initial data
and then catch up with apply worker in the same transaction. There is
a comment in LogicalRepSyncTableStart ("We want to do the table data
sync in a single transaction.") saying so but I can't find the
concrete theory behind the same. Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker? I have tried doing so in the attached (a
quick prototype to test) and didn't find any problems with regression
tests. I have tried a few manual tests as well to see if it works and
didn't find any problem. Now, it is quite possible that it is
mandatory to do the way we are doing currently, or maybe something
else is required to remove this requirement but I think we can do
better with respect to comments in this area.

The reason why I am looking into this area is to support the logical
decoding of prepared transactions. See the problem [1]/messages/by-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com reported by
Peter Smith. Basically, when we stream prepared transactions in the
tablesync worker, it will simply commit the same due to the
requirement of maintaining a single transaction for the entire
duration of copy and streaming of transactions. Now, we can fix that
problem by disabling the decoding of prepared xacts in tablesync
worker. But that will arise to a different kind of problems like the
prepare will not be sent by the publisher but a later commit might
move lsn to a later step which will allow it to catch up till the
apply worker. So, now the prepared transaction will be skipped by both
tablesync and apply worker.

I think apart from unblocking the development of 'logical decoding of
prepared xacts', it will make the code consistent between apply and
tablesync worker and reduce the chances of future bugs in this area.
Basically, it will reduce the checks related to am_tablesync_worker()
at various places in the code.

I see that this code is added as part of commit
7c4f52409a8c7d85ed169bbbc1f6092274d03920 (Logical replication support
for initial data copy).

Thoughts?

[1]: /messages/by-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com

--
With Regards,
Amit Kapila.

Attachments:

v1-0001-Allow-more-than-one-transaction-in-tablesync-work.patchapplication/octet-stream; name=v1-0001-Allow-more-than-one-transaction-in-tablesync-work.patchDownload

From 9f2d1ff2a181136efe2d5db0e6ac43bec909a1f1 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 3 Dec 2020 14:18:19 +0530
Subject: [PATCH v1] Allow more than one transaction in tablesync worker.

---
 src/backend/replication/logical/tablesync.c |  9 ++++++++-
 src/backend/replication/logical/worker.c    | 19 +++++--------------
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 1904f34..886298e 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -270,7 +270,8 @@ invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
 static void
 process_syncing_tables_for_sync(XLogRecPtr current_lsn)
 {
-	Assert(IsTransactionState());
+	if (!IsTransactionState())
+		StartTransactionCommand();
 
 	SpinLockAcquire(&MyLogicalRepWorker->relmutex);
 
@@ -294,6 +295,9 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
 	}
 	else
 		SpinLockRelease(&MyLogicalRepWorker->relmutex);
+
+	if (IsTransactionState())
+		CommitTransactionCommand();
 }
 
 /*
@@ -943,6 +947,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
 	/* Make the copy visible. */
 	CommandCounterIncrement();
 
+	CommitTransactionCommand();
+	StartTransactionCommand();
+
 	/*
 	 * We are done with the initial data synchronization, update the state.
 	 */
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 8c7fad8..af6a98a 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -807,12 +807,8 @@ apply_handle_stream_stop(StringInfo s)
 	/* We must be in a valid transaction state */
 	Assert(IsTransactionState());
 
-	/* The synchronization worker runs in single transaction. */
-	if (!am_tablesync_worker())
-	{
-		/* Commit the per-stream transaction */
-		CommitTransactionCommand();
-	}
+	/* Commit the per-stream transaction */
+	CommitTransactionCommand();
 
 	in_streamed_transaction = false;
 
@@ -888,10 +884,7 @@ apply_handle_stream_abort(StringInfo s)
 		{
 			/* Cleanup the subxact info */
 			cleanup_subxact_info();
-
-			/* The synchronization worker runs in single transaction */
-			if (!am_tablesync_worker())
-				CommitTransactionCommand();
+			CommitTransactionCommand();
 			return;
 		}
 
@@ -918,8 +911,7 @@ apply_handle_stream_abort(StringInfo s)
 		/* write the updated subxact list */
 		subxact_info_write(MyLogicalRepWorker->subid, xid);
 
-		if (!am_tablesync_worker())
-			CommitTransactionCommand();
+		CommitTransactionCommand();
 	}
 }
 
@@ -1062,8 +1054,7 @@ apply_handle_stream_commit(StringInfo s)
 static void
 apply_handle_commit_internal(StringInfo s, LogicalRepCommitData* commit_data)
 {
-	/* The synchronization worker runs in single transaction. */
-	if (IsTransactionState() && !am_tablesync_worker())
+	if (IsTransactionState())
 	{
 		/*
 		 * Update origin state so we can restart streaming from correct
-- 
1.8.3.1