Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

Started by Dave Cramerover 7 years ago16 messageshackers
Jump to latest
#1Dave Cramer
pg@fastcrypt.com

Back in 2016 a patch was proposed that seems to have died on the vine. See
/messages/by-id/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w@mail.gmail.com

for the history and https://commitfest.postgresql.org/10/621/ for the
commitfest entry.

I've rebased the patches and attached them for consideration.

JDBC tests here
https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/replication/LogicalReplicationTest.java
all pass

Regards,

Dave Cramer

Attachments:

0004-Add-test-for-pg_recvlogical-to-stop-replication.patchapplication/octet-stream; name=0004-Add-test-for-pg_recvlogical-to-stop-replication.patchDownload+226-1
0003-Add-ability-for-pg_recvlogical-to-stop-replication-f.patchapplication/octet-stream; name=0003-Add-ability-for-pg_recvlogical-to-stop-replication-f.patchDownload+290-201
0002-Client-initiated-CopyDone-during-transaction-decodin.patchapplication/octet-stream; name=0002-Client-initiated-CopyDone-during-transaction-decodin.patchDownload+65-28
0001-Respect-client-initiated-CopyDone-in-walsender.patchapplication/octet-stream; name=0001-Respect-client-initiated-CopyDone-in-walsender.patchDownload+30-7
#2Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dave Cramer (#1)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

On Tue, Jul 24, 2018 at 5:52 PM Dave Cramer <davecramer@gmail.com> wrote:

Back in 2016 a patch was proposed that seems to have died on the vine. See /messages/by-id/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w@mail.gmail.com
for the history and https://commitfest.postgresql.org/10/621/ for the commitfest entry.
I've rebased the patches and attached them for consideration.
JDBC tests here https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/replication/LogicalReplicationTest.java all pass

Unfortunately the second patch from the series can't be applied anymore, could
you please rebase it one more time? Other than that it looks strange for me
that the corresponding discussion stopped when it was quite close to be in a
good shape, bouncing from "rejected with feedback" to "needs review". Can
someone from involved people clarify what's the current status of this patch?

#3Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dmitry Dolgov (#2)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

On Mon, Nov 19, 2018 at 4:58 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Tue, Jul 24, 2018 at 5:52 PM Dave Cramer <davecramer@gmail.com> wrote:

Back in 2016 a patch was proposed that seems to have died on the vine. See /messages/by-id/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w@mail.gmail.com
for the history and https://commitfest.postgresql.org/10/621/ for the commitfest entry.
I've rebased the patches and attached them for consideration.
JDBC tests here https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/replication/LogicalReplicationTest.java all pass

Unfortunately the second patch from the series can't be applied anymore, could
you please rebase it one more time? Other than that it looks strange for me
that the corresponding discussion stopped when it was quite close to be in a
good shape, bouncing from "rejected with feedback" to "needs review". Can
someone from involved people clarify what's the current status of this patch?

Looks like I'm a failure as a neuromancer, since revival didn't happen. I'm
marking it as returned with feedback.

#4Dave Cramer
pg@fastcrypt.com
In reply to: Dmitry Dolgov (#3)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

Why is this being closed? I did not see the first email looking for
clarification.

The history is the original author dropped off the planet (no idea where he
is)

I can certainly rebase it.

Dave Cramer

On Fri, 30 Nov 2018 at 18:00, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Show quoted text

On Mon, Nov 19, 2018 at 4:58 PM Dmitry Dolgov <9erthalion6@gmail.com>

wrote:

On Tue, Jul 24, 2018 at 5:52 PM Dave Cramer <davecramer@gmail.com>

wrote:

Back in 2016 a patch was proposed that seems to have died on the vine.

See
/messages/by-id/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w@mail.gmail.com

for the history and https://commitfest.postgresql.org/10/621/ for the

commitfest entry.

I've rebased the patches and attached them for consideration.
JDBC tests here

https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/replication/LogicalReplicationTest.java
all pass

Unfortunately the second patch from the series can't be applied anymore,

could

you please rebase it one more time? Other than that it looks strange for

me

that the corresponding discussion stopped when it was quite close to be

in a

good shape, bouncing from "rejected with feedback" to "needs review". Can
someone from involved people clarify what's the current status of this

patch?

Looks like I'm a failure as a neuromancer, since revival didn't happen. I'm
marking it as returned with feedback.

#5Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dave Cramer (#4)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

On Sat, Dec 1, 2018 at 12:17 AM Dave Cramer <davecramer@gmail.com> wrote:

Why is this being closed? I did not see the first email looking for clarification.

Well, mostly due total absence of response and broken mind reading crystal ball.

I can certainly rebase it.

Yes, please do. I'll change the CF item status back.

#6Dave Cramer
pg@fastcrypt.com
In reply to: Dmitry Dolgov (#5)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

Dmitry,

Thanks, I have done a preliminary check and it seems pretty straightforward.

I will clean it up for Monday

Thanks for your patience!

Dave Cramer

On Fri, 30 Nov 2018 at 18:22, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Show quoted text

On Sat, Dec 1, 2018 at 12:17 AM Dave Cramer <davecramer@gmail.com> wrote:

Why is this being closed? I did not see the first email looking for

clarification.

Well, mostly due total absence of response and broken mind reading crystal
ball.

I can certainly rebase it.

Yes, please do. I'll change the CF item status back.

#7Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Dave Cramer (#6)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

On Sat, Dec 1, 2018 at 12:49 AM Dave Cramer <davecramer@gmail.com> wrote:

Thanks, I have done a preliminary check and it seems pretty straightforward.

I will clean it up for Monday

Great, thank you!

#8Dave Cramer
pg@fastcrypt.com
In reply to: Dmitry Dolgov (#7)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

Dmitry,

Please see attached rebased patches

Dave Cramer

On Fri, 30 Nov 2018 at 18:52, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Show quoted text

On Sat, Dec 1, 2018 at 12:49 AM Dave Cramer <davecramer@gmail.com> wrote:

Thanks, I have done a preliminary check and it seems pretty

straightforward.

I will clean it up for Monday

Great, thank you!

Attachments:

0004-Add-test-for-pg_recvlogical-to-stop-replication.patchapplication/octet-stream; name=0004-Add-test-for-pg_recvlogical-to-stop-replication.patchDownload+226-1
0003-Add-ability-for-pg_recvlogical-to-stop-replication-f.patchapplication/octet-stream; name=0003-Add-ability-for-pg_recvlogical-to-stop-replication-f.patchDownload+290-201
0001-Respect-client-initiated-CopyDone-in-walsender.patchapplication/octet-stream; name=0001-Respect-client-initiated-CopyDone-in-walsender.patchDownload+30-7
0002-Client-initiated-CopyDone-during-transaction-decodin.patchapplication/octet-stream; name=0002-Client-initiated-CopyDone-during-transaction-decodin.patchDownload+65-28
#9Craig Ringer
craig@2ndquadrant.com
In reply to: Dave Cramer (#8)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

On Mon, 3 Dec 2018 at 19:38, Dave Cramer <davecramer@gmail.com> wrote:

Dmitry,

Please see attached rebased patches

I'm fine with patch 0001, though I find this comment a bit hard to follow:

+ * The send_data callback must enqueue complete CopyData messages to libpq
+ * using pq_putmessage_noblock or similar, since the walsender loop may
send
+ * CopyDone then exit and return to command mode in response to a client
+ * CopyDone between calls to send_data.

I think it needs splitting up into a couple of sentences.

In patch 0002, stopping during a txn. It's important that once we skip any
action, we continue skipping. In patch 0002 I'd like it to be clearer that
we will *always* skip the rb->commit callback if rb->continue_decoding_cb()
returned false and aborted the while loop. I suggest storing the result of
(rb->continue_decoding_cb == NULL || rb->continue_decoding_cb()) in an
assignment within the while loop, and testing that result later.

e.g.

(continue_decoding = (rb->continue_decoding_cb == NULL ||
rb->continue_decoding_cb()))

and later

if (continue_decoding) {
rb->commit(rb, txn, commit_lsn);
}

I don't actually think it's necessary to re-test the continue_decoding
callback and skip commit here. If we've sent all the of the txn
except the commit, we might as well send the commit, it's tiny and might
save some hassle later.

I think a comment on the skipped commit would be good too:

/*
* If we skipped any changes due to a client CopyDone we must not send a
commit
* otherwise the client would incorrectly think it received a complete
transaction.
*/

I notice that the fast-path logic in WalSndWriteData is removed by this
patch. It isn't moved; there's no comparison of last_reply_timestamp
and wal_sender_timeout now. What's the rationale there? You want to ensure
that we reach ProcessRepliesIfAny() ? Can we cheaply test for a readable
client socket then still take the fast-path if it's not readable?

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#10Dave Cramer
pg@fastcrypt.com
In reply to: Craig Ringer (#9)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

Dave Cramer

On Sun, 13 Jan 2019 at 23:19, Craig Ringer <craig@2ndquadrant.com> wrote:

On Mon, 3 Dec 2018 at 19:38, Dave Cramer <davecramer@gmail.com> wrote:

Dmitry,

Please see attached rebased patches

I'm fine with patch 0001, though I find this comment a bit hard to follow:

+ * The send_data callback must enqueue complete CopyData messages to libpq
+ * using pq_putmessage_noblock or similar, since the walsender loop may
send
+ * CopyDone then exit and return to command mode in response to a client
+ * CopyDone between calls to send_data.

I think it needs splitting up into a couple of sentences.

Fair point, remember it was originally written by a non-english speaker

In patch 0002, stopping during a txn. It's important that once we skip any
action, we continue skipping. In patch 0002 I'd like it to be clearer that
we will *always* skip the rb->commit callback if rb->continue_decoding_cb()
returned false and aborted the while loop. I suggest storing the result of
(rb->continue_decoding_cb == NULL || rb->continue_decoding_cb()) in an
assignment within the while loop, and testing that result later.

e.g.

(continue_decoding = (rb->continue_decoding_cb == NULL ||
rb->continue_decoding_cb()))

and later

if (continue_decoding) {
rb->commit(rb, txn, commit_lsn);
}

Will do

I don't actually think it's necessary to re-test the continue_decoding
callback and skip commit here. If we've sent all the of the txn
except the commit, we might as well send the commit, it's tiny and might
save some hassle later.

I think a comment on the skipped commit would be good too:

/*
* If we skipped any changes due to a client CopyDone we must not send a
commit
* otherwise the client would incorrectly think it received a complete
transaction.
*/

I notice that the fast-path logic in WalSndWriteData is removed by this
patch. It isn't moved; there's no comparison of last_reply_timestamp
and wal_sender_timeout now. What's the rationale there? You want to ensure
that we reach ProcessRepliesIfAny() ? Can we cheaply test for a readable
client socket then still take the fast-path if it's not readable?

This may have been a mistake as I am taking this over from a very old patch
that I did not originally write. I'll have a look at this

I

Show quoted text

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#11Dave Cramer
pg@fastcrypt.com
In reply to: Dave Cramer (#10)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

Dave Cramer

On Tue, 15 Jan 2019 at 07:53, Dave Cramer <davecramer@gmail.com> wrote:

Dave Cramer

On Sun, 13 Jan 2019 at 23:19, Craig Ringer <craig@2ndquadrant.com> wrote:

On Mon, 3 Dec 2018 at 19:38, Dave Cramer <davecramer@gmail.com> wrote:

Dmitry,

Please see attached rebased patches

I'm fine with patch 0001, though I find this comment a bit hard to follow:

+ * The send_data callback must enqueue complete CopyData messages to
libpq
+ * using pq_putmessage_noblock or similar, since the walsender loop may
send
+ * CopyDone then exit and return to command mode in response to a client
+ * CopyDone between calls to send_data.

I think it needs splitting up into a couple of sentences.

Fair point, remember it was originally written by a non-english speaker

Thoughts on below ?

+/*
+ * Main loop of walsender process that streams the WAL over Copy messages.
+ *
+ * The send_data callback must enqueue complete CopyData messages to the
client
+ * using pq_putmessage_noblock or similar
+ * In order to preserve the protocol it is necessary to send all of the
existing
+ * messages still in the buffer as the WalSender loop may send
+ * CopyDone then exit and return to command mode in response to a client
+ * CopyDone between calls to send_data.
+ */

In patch 0002, stopping during a txn. It's important that once we skip
any action, we continue skipping. In patch 0002 I'd like it to be clearer
that we will *always* skip the rb->commit callback
if rb->continue_decoding_cb() returned false and aborted the while loop. I
suggest storing the result of (rb->continue_decoding_cb == NULL ||
rb->continue_decoding_cb()) in an assignment within the while loop, and
testing that result later.

e.g.

(continue_decoding = (rb->continue_decoding_cb == NULL ||
rb->continue_decoding_cb()))

and later

if (continue_decoding) {
rb->commit(rb, txn, commit_lsn);
}

Will do

Hmmm... I don't actually see how this is any different than what we have in
the patch now where below would the test occur?

I don't actually think it's necessary to re-test the continue_decoding

callback and skip commit here. If we've sent all the of the txn
except the commit, we might as well send the commit, it's tiny and might
save some hassle later.

I think a comment on the skipped commit would be good too:

/*
* If we skipped any changes due to a client CopyDone we must not send a
commit
* otherwise the client would incorrectly think it received a complete
transaction.
*/

I notice that the fast-path logic in WalSndWriteData is removed by this
patch. It isn't moved; there's no comparison of last_reply_timestamp
and wal_sender_timeout now. What's the rationale there? You want to ensure
that we reach ProcessRepliesIfAny() ? Can we cheaply test for a readable
client socket then still take the fast-path if it's not readable?

This may have been a mistake as I am taking this over from a very old
patch that I did not originally write. I'll have a look at this

OK, I'm trying to decipher the original intent of this patch as well as I
didn't write it.

There are some hints here
/messages/by-id/CAFgjRd1LgVbtH=9O9_xvKQjvUP7aRF-edxqwKfaNs9hMFW_4gw@mail.gmail.com

As to why the fast path logic was removed. Does it make sense to you?

Dave

Show quoted text
#12Andres Freund
andres@anarazel.de
In reply to: Dave Cramer (#8)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

Hi,

On 2018-12-03 06:38:43 -0500, Dave Cramer wrote:

From 4d023cfc1fed0b5852b4da1aad6a32549b03ce26 Mon Sep 17 00:00:00 2001
From: Dave Cramer <davecramer@gmail.com>
Date: Fri, 30 Nov 2018 18:23:49 -0500
Subject: [PATCH 1/5] Respect client initiated CopyDone in walsender

---
src/backend/replication/walsender.c | 36 ++++++++++++++++++++++++++++++------
1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 46edb52..93f2648 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -770,6 +770,14 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
sendTimeLineValidUpto = state->currTLIValidUntil;
sendTimeLineNextTLI = state->nextTLI;
+	/*
+	* If the client sent CopyDone while we were waiting,
+	* bail out so we can wind up the decoding session.
+	*/
+	if (streamingDoneSending)
+		return -1;
+
+	 /* more than one block available */
/* make sure we have enough WAL available */
flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
@@ -1341,8 +1349,12 @@ WalSndWaitForWal(XLogRecPtr loc)
* It's important to do this check after the recomputation of
* RecentFlushPtr, so we can send all remaining data before shutting
* down.
-		 */
-		if (got_STOPPING)
+		 *
+		 * We'll also exit here if the client sent CopyDone because it wants
+		 * to return to command mode.
+		*/
+
+		if (got_STOPPING || streamingDoneReceiving)
break;

/*
@@ -2095,7 +2107,14 @@ WalSndCheckTimeOut(void)
}
}

-/* Main loop of walsender process that streams the WAL over Copy messages. */
+/*
+ * Main loop of walsender process that streams the WAL over Copy messages.
+ *
+ * The send_data callback must enqueue complete CopyData messages to libpq
+ * using pq_putmessage_noblock or similar, since the walsender loop may send
+ * CopyDone then exit and return to command mode in response to a client
+ * CopyDone between calls to send_data.
+ */

Wait, how is it ok to end CopyDone before all the pending data has been
sent out?

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 23466ba..66b6e90 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1497,7 +1497,9 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
rb->begin(rb, txn);
iterstate = ReorderBufferIterTXNInit(rb, txn);
-		while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL)
+		while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL &&
+			   (rb->continue_decoding_cb == NULL ||
+				rb->continue_decoding_cb()))
{
Relation	relation = NULL;
Oid			reloid;

@@ -1774,8 +1776,11 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
ReorderBufferIterTXNFinish(rb, iterstate);
iterstate = NULL;

-		/* call commit callback */
-		rb->commit(rb, txn, commit_lsn);
+		if (rb->continue_decoding_cb == NULL || rb->continue_decoding_cb())
+		{
+			/* call commit callback */
+			rb->commit(rb, txn, commit_lsn);
+		}

I'm doubtful it's ok to simply stop in the middle of a transaction.

@@ -1194,17 +1212,6 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,

CHECK_FOR_INTERRUPTS();

- /* Try to flush pending output to the client */
- if (pq_flush_if_writable() != 0)
- WalSndShutdown();
-
- /* Try taking fast path unless we get too close to walsender timeout. */
- if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
- wal_sender_timeout / 2) &&
- !pq_is_send_pending())
- {
- return;
- }

As somebody else commented on the thread, I'm also doubtful this is
ok. This'll introduce significant additional blocking unless I'm missing
something?

/* If we have pending write here, go to slow path */
for (;;)
@@ -1358,7 +1365,14 @@ WalSndWaitForWal(XLogRecPtr loc)
break;

/*
-		 * We only send regular messages to the client for full decoded
+		 * If we have received CopyDone from the client, sent CopyDone
+		 * ourselves, it's time to exit streaming.
+		 */
+		if (!IsStreamingActive()) {
+			break;
+		}

Wrong formatting.

I wonder if the saner approach here isn't to support query cancellations
or something of that vein, and then handle the error.

Greetings,

Andres Freund

#13Dave Cramer
pg@fastcrypt.com
In reply to: Andres Freund (#12)
Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

Andres,

Thanks for looking at this. FYI, I did not originally write this, rather
the original author has not replied to requests.
JDBC could use this, I assume others could as well.

That said I'm certainly open to suggestions on how to do this.

Craig, do you have any other ideas?

Dave Cramer

On Fri, 15 Feb 2019 at 22:01, Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Hi,

On 2018-12-03 06:38:43 -0500, Dave Cramer wrote:

From 4d023cfc1fed0b5852b4da1aad6a32549b03ce26 Mon Sep 17 00:00:00 2001
From: Dave Cramer <davecramer@gmail.com>
Date: Fri, 30 Nov 2018 18:23:49 -0500
Subject: [PATCH 1/5] Respect client initiated CopyDone in walsender

---
src/backend/replication/walsender.c | 36

++++++++++++++++++++++++++++++------

1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/walsender.c

b/src/backend/replication/walsender.c

index 46edb52..93f2648 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -770,6 +770,14 @@ logical_read_xlog_page(XLogReaderState *state,

XLogRecPtr targetPagePtr, int req

sendTimeLineValidUpto = state->currTLIValidUntil;
sendTimeLineNextTLI = state->nextTLI;

+     /*
+     * If the client sent CopyDone while we were waiting,
+     * bail out so we can wind up the decoding session.
+     */
+     if (streamingDoneSending)
+             return -1;
+
+      /* more than one block available */
/* make sure we have enough WAL available */
flushptr = WalSndWaitForWal(targetPagePtr + reqLen);

@@ -1341,8 +1349,12 @@ WalSndWaitForWal(XLogRecPtr loc)
* It's important to do this check after the recomputation

of

* RecentFlushPtr, so we can send all remaining data

before shutting

* down.
-              */
-             if (got_STOPPING)
+              *
+              * We'll also exit here if the client sent CopyDone

because it wants

+              * to return to command mode.
+             */
+
+             if (got_STOPPING || streamingDoneReceiving)
break;

/*
@@ -2095,7 +2107,14 @@ WalSndCheckTimeOut(void)
}
}

-/* Main loop of walsender process that streams the WAL over Copy

messages. */

+/*
+ * Main loop of walsender process that streams the WAL over Copy

messages.

+ *
+ * The send_data callback must enqueue complete CopyData messages to

libpq

+ * using pq_putmessage_noblock or similar, since the walsender loop may

send

+ * CopyDone then exit and return to command mode in response to a client
+ * CopyDone between calls to send_data.
+ */

Wait, how is it ok to end CopyDone before all the pending data has been
sent out?

diff --git a/src/backend/replication/logical/reorderbuffer.c

b/src/backend/replication/logical/reorderbuffer.c

index 23466ba..66b6e90 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1497,7 +1497,9 @@ ReorderBufferCommit(ReorderBuffer *rb,

TransactionId xid,

rb->begin(rb, txn);

iterstate = ReorderBufferIterTXNInit(rb, txn);
- while ((change = ReorderBufferIterTXNNext(rb, iterstate))

!= NULL)

+ while ((change = ReorderBufferIterTXNNext(rb, iterstate))

!= NULL &&

+                        (rb->continue_decoding_cb == NULL ||
+                             rb->continue_decoding_cb()))
{
Relation        relation = NULL;
Oid                     reloid;

@@ -1774,8 +1776,11 @@ ReorderBufferCommit(ReorderBuffer *rb,

TransactionId xid,

ReorderBufferIterTXNFinish(rb, iterstate);
iterstate = NULL;

-             /* call commit callback */
-             rb->commit(rb, txn, commit_lsn);
+             if (rb->continue_decoding_cb == NULL ||

rb->continue_decoding_cb())

+             {
+                     /* call commit callback */
+                     rb->commit(rb, txn, commit_lsn);
+             }

I'm doubtful it's ok to simply stop in the middle of a transaction.

@@ -1194,17 +1212,6 @@ WalSndWriteData(LogicalDecodingContext *ctx,

XLogRecPtr lsn, TransactionId xid,

CHECK_FOR_INTERRUPTS();

- /* Try to flush pending output to the client */
- if (pq_flush_if_writable() != 0)
- WalSndShutdown();
-
- /* Try taking fast path unless we get too close to walsender

timeout. */

- if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
-

wal_sender_timeout / 2) &&

- !pq_is_send_pending())
- {
- return;
- }

As somebody else commented on the thread, I'm also doubtful this is
ok. This'll introduce significant additional blocking unless I'm missing
something?

/* If we have pending write here, go to slow path */
for (;;)
@@ -1358,7 +1365,14 @@ WalSndWaitForWal(XLogRecPtr loc)
break;

/*
- * We only send regular messages to the client for full

decoded

+ * If we have received CopyDone from the client, sent

CopyDone

+              * ourselves, it's time to exit streaming.
+              */
+             if (!IsStreamingActive()) {
+                     break;
+             }

Wrong formatting.

I wonder if the saner approach here isn't to support query cancellations
or something of that vein, and then handle the error.

Greetings,

Andres Freund

#14David Steele
david@pgmasters.net
In reply to: Dave Cramer (#13)
Re: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

On 2/16/19 10:38 PM, Dave Cramer wrote:

Thanks for looking at this. FYI, I did not originally write this, rather
the original author has not replied to requests.
JDBC could use this, I assume others could as well.

That said I'm certainly open to suggestions on how to do this.

Craig, do you have any other ideas?

This patch appears to be stalled. Are there any ideas on how to proceed?

Regards,
--
-David
david@pgmasters.net

#15Thomas Munro
thomas.munro@gmail.com
In reply to: David Steele (#14)
Re: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

On Thu, Mar 7, 2019 at 8:19 PM David Steele <david@pgmasters.net> wrote:

On 2/16/19 10:38 PM, Dave Cramer wrote:

Thanks for looking at this. FYI, I did not originally write this, rather
the original author has not replied to requests.
JDBC could use this, I assume others could as well.

That said I'm certainly open to suggestions on how to do this.

Craig, do you have any other ideas?

This patch appears to be stalled. Are there any ideas on how to proceed?

Hi Dave (C),

It sounds like this should be withdrawn for now, and potentially
revived in some future CF if someone has time to work on it?

--
Thomas Munro
https://enterprisedb.com

#16Dave Cramer
pg@fastcrypt.com
In reply to: Thomas Munro (#15)
Re: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

On Mon, 8 Jul 2019 at 06:40, Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Mar 7, 2019 at 8:19 PM David Steele <david@pgmasters.net> wrote:

On 2/16/19 10:38 PM, Dave Cramer wrote:

Thanks for looking at this. FYI, I did not originally write this,

rather

the original author has not replied to requests.
JDBC could use this, I assume others could as well.

That said I'm certainly open to suggestions on how to do this.

Craig, do you have any other ideas?

This patch appears to be stalled. Are there any ideas on how to proceed?

Hi Dave (C),

It sounds like this should be withdrawn for now, and potentially
revived in some future CF if someone has time to work on it?

Hi Thomas,

I'm fine with that decision.

Thanks,

Dave Cramer

Show quoted text