9.4 logical replication - walsender keepalive replies

Started by Steve Singerover 11 years ago6 messages
#1Steve Singer
steve@ssinger.info

In 9.4 we've the below block of code to walsender.c as

/*
* We only send regular messages to the client for full decoded
* transactions, but a synchronous replication and walsender shutdown
* possibly are waiting for a later location. So we send pings
* containing the flush location every now and then.
*/
if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response)
{
WalSndKeepalive(true);
waiting_for_ping_response = true;
}

I am finding that my logical replication reader is spending a tremendous
amount of time sending feedback to the server because a keep alive reply
was requested. My flush pointer is smaller than sendPtr, which I see as
the normal case (The client hasn't confirmed all the wal it has been
sent). My client queues the records it receives and only confirms when
actually processes the record.

So the sequence looks something like

Server Sends LSN 0/1000
Server Sends LSN 0/2000
Server Sends LSN 0/3000
Client confirms LSN 0/2000

I don't see why all these keep alive replies are needed in this case
(the timeout value is bumped way up, it's the above block that is
triggering the reply request not something related to timeout)

Steve

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andres Freund
andres@2ndquadrant.com
In reply to: Steve Singer (#1)
Re: 9.4 logical replication - walsender keepalive replies

Hi Steve,

On 2014-06-30 11:40:50 -0400, Steve Singer wrote:

In 9.4 we've the below block of code to walsender.c as

/*
* We only send regular messages to the client for full decoded
* transactions, but a synchronous replication and walsender shutdown
* possibly are waiting for a later location. So we send pings
* containing the flush location every now and then.
*/
if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response)
{
WalSndKeepalive(true);
waiting_for_ping_response = true;
}

I am finding that my logical replication reader is spending a tremendous
amount of time sending feedback to the server because a keep alive reply was
requested. My flush pointer is smaller than sendPtr, which I see as the
normal case (The client hasn't confirmed all the wal it has been sent). My
client queues the records it receives and only confirms when actually
processes the record.

So the sequence looks something like

Server Sends LSN 0/1000
Server Sends LSN 0/2000
Server Sends LSN 0/3000
Client confirms LSN 0/2000

I don't see why all these keep alive replies are needed in this case (the
timeout value is bumped way up, it's the above block that is triggering the
reply request not something related to timeout)

Right. I thought about this for a while, and I think we should change
two things. For one, don't request replies here. It's simply not needed,
as this isn't dealing with timeouts. For another don't just check ->flush
< sentPtr but also && ->write < sentPtr. The reason we're sending these
feedback messages is to inform the 'logical standby' that there's been
WAL activity which it can't see because they don't correspond to
anything that's logically decoded (e.g. vacuum stuff).
Would that suit your needs?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#2)
1 attachment(s)
Re: 9.4 logical replication - walsender keepalive replies

On 07/06/2014 10:11 AM, Andres Freund wrote:

Hi Steve,

Right. I thought about this for a while, and I think we should change
two things. For one, don't request replies here. It's simply not needed,
as this isn't dealing with timeouts. For another don't just check ->flush
< sentPtr but also && ->write < sentPtr. The reason we're sending these
feedback messages is to inform the 'logical standby' that there's been
WAL activity which it can't see because they don't correspond to
anything that's logically decoded (e.g. vacuum stuff).
Would that suit your needs?

Greetings,

Yes I think that will work for me.
I tested with the attached patch that I think does what you describe
and it seems okay.

Show quoted text

Andres Freund

Attachments:

walsender_response.difftext/x-patch; name=walsender_response.diffDownload
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
new file mode 100644
index 3189793..844a5de
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*************** WalSndWaitForWal(XLogRecPtr loc)
*** 1203,1211 ****
  		 * possibly are waiting for a later location. So we send pings
  		 * containing the flush location every now and then.
  		 */
! 		if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response)
  		{
! 			WalSndKeepalive(true);
  			waiting_for_ping_response = true;
  		}
  
--- 1203,1213 ----
  		 * possibly are waiting for a later location. So we send pings
  		 * containing the flush location every now and then.
  		 */
! 		if (MyWalSnd->flush < sentPtr &&
! 			MyWalSnd->write < sentPtr &&
! 			!waiting_for_ping_response)
  		{
! 			WalSndKeepalive(false);
  			waiting_for_ping_response = true;
  		}
  
#4Steve Singer
steve@ssinger.info
In reply to: Steve Singer (#3)
Re: 9.4 logical replication - walsender keepalive replies

On 07/14/2014 01:19 PM, Steve Singer wrote:

On 07/06/2014 10:11 AM, Andres Freund wrote:

Hi Steve,

Right. I thought about this for a while, and I think we should change
two things. For one, don't request replies here. It's simply not needed,
as this isn't dealing with timeouts. For another don't just check
->flush
< sentPtr but also && ->write < sentPtr. The reason we're sending these
feedback messages is to inform the 'logical standby' that there's been
WAL activity which it can't see because they don't correspond to
anything that's logically decoded (e.g. vacuum stuff).
Would that suit your needs?

Greetings,

Yes I think that will work for me.
I tested with the attached patch that I think does what you describe
and it seems okay.

Any feedback on this? Do we want that change for 9.4, or do we want
something else?

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@2ndquadrant.com
In reply to: Steve Singer (#4)
Re: 9.4 logical replication - walsender keepalive replies

On 2014-08-11 17:22:27 -0400, Steve Singer wrote:

On 07/14/2014 01:19 PM, Steve Singer wrote:

On 07/06/2014 10:11 AM, Andres Freund wrote:

Hi Steve,

Right. I thought about this for a while, and I think we should change
two things. For one, don't request replies here. It's simply not needed,
as this isn't dealing with timeouts. For another don't just check
->flush
< sentPtr but also && ->write < sentPtr. The reason we're sending these
feedback messages is to inform the 'logical standby' that there's been
WAL activity which it can't see because they don't correspond to
anything that's logically decoded (e.g. vacuum stuff).
Would that suit your needs?

Greetings,

Yes I think that will work for me.
I tested with the attached patch that I think does what you describe and
it seems okay.

Any feedback on this? Do we want that change for 9.4, or do we want
something else?

I plan to test and apply it in the next few days. Digging myself from
under stuff from before my holiday right now...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#5)
Re: 9.4 logical replication - walsender keepalive replies

On 2014-08-11 23:52:32 +0200, Andres Freund wrote:

On 2014-08-11 17:22:27 -0400, Steve Singer wrote:

On 07/14/2014 01:19 PM, Steve Singer wrote:

On 07/06/2014 10:11 AM, Andres Freund wrote:

Hi Steve,

Right. I thought about this for a while, and I think we should change
two things. For one, don't request replies here. It's simply not needed,
as this isn't dealing with timeouts. For another don't just check
->flush
< sentPtr but also && ->write < sentPtr. The reason we're sending these
feedback messages is to inform the 'logical standby' that there's been
WAL activity which it can't see because they don't correspond to
anything that's logically decoded (e.g. vacuum stuff).
Would that suit your needs?

Greetings,

Yes I think that will work for me.
I tested with the attached patch that I think does what you describe and
it seems okay.

Any feedback on this? Do we want that change for 9.4, or do we want
something else?

I plan to test and apply it in the next few days. Digging myself from
under stuff from before my holiday right now...

Committed. Thanks and sorry for the delay.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers