hot_standby_feedback doesn't work on busy servers in 9.3+
Hello,
In 9.3+, when the primary is busy hot_standby_feedback doesn't properly
work with symptoms like a) spurious query cancels because no feedback
messages have been sent yet, b) bloat on the primary because we don't
regularly send feedback leading to stale xmins on the primary.
The relevant code is in walreceiver.c's WalReceiverMain():
len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
if (len != 0)
{
...
/* Let the master know that we received some data. */
XLogWalRcvSendReply(false, false);
/*
* If we've written some records, flush them to disk and
* let the startup process and primary server know about
* them.
*/
XLogWalRcvFlush(false);
}
else
{
...
XLogWalRcvSendReply(requestReply, requestReply);
XLogWalRcvSendHSFeedback(false);
}
So, when the connection always has data, we'll not send feedback. That's
pretty easy to demonstrate by running pgbench -jc 8 or so against the
primary.
Looking into this I also noticed that the busy path is odd, because a)
why are we sending a reply before flushing things to disk? b)
XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().
To a good part that seems to have been introduced in
6f60fdd7015b032bf49273c99f80913d57eac284.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Jan 15, 2014 at 6:31 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hello,
In 9.3+, when the primary is busy hot_standby_feedback doesn't properly
work with symptoms like a) spurious query cancels because no feedback
messages have been sent yet, b) bloat on the primary because we don't
regularly send feedback leading to stale xmins on the primary.The relevant code is in walreceiver.c's WalReceiverMain():
len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
if (len != 0)
{
...
/* Let the master know that we received some data. */
XLogWalRcvSendReply(false, false);/*
* If we've written some records, flush them to disk and
* let the startup process and primary server know about
* them.
*/
XLogWalRcvFlush(false);
}
else
{
...
XLogWalRcvSendReply(requestReply, requestReply);
XLogWalRcvSendHSFeedback(false);
}So, when the connection always has data, we'll not send feedback. That's
pretty easy to demonstrate by running pgbench -jc 8 or so against the
primary.Looking into this I also noticed that the busy path is odd, because a)
why are we sending a reply before flushing things to disk? b)
XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().
I think call to reply in XLogWalRcvFlush() might not actually send
reply because of time difference of last Reply message which we
sent before flush call.
The only point that occurs to me for having such a code is that
incase flush call fails due to disk space or some other such issue,
it can atleast send the correct write position to primary.
To a good part that seems to have been introduced in
6f60fdd7015b032bf49273c99f80913d57eac284.
I don't see that above commit has introduced this behaviour, as
the code in question seems to be there without commit as well,
means that HS feedback is only sent when we don't receive data in WAL
receiver.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
On 2014-01-16 10:26:51 +0530, Amit Kapila wrote:
Looking into this I also noticed that the busy path is odd, because a)
why are we sending a reply before flushing things to disk? b)
XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().I think call to reply in XLogWalRcvFlush() might not actually send
reply because of time difference of last Reply message which we
sent before flush call.
Yes, but most of the time that will lead to only outdated data to be
sent since that's the first call. Hardly a severe issue, odd nonetheless.
The only point that occurs to me for having such a code is that
incase flush call fails due to disk space or some other such issue,
it can atleast send the correct write position to primary.
Unconvinced.
That's really a separate issue though.
To a good part that seems to have been introduced in
6f60fdd7015b032bf49273c99f80913d57eac284.I don't see that above commit has introduced this behaviour, as
the code in question seems to be there without commit as well,
means that HS feedback is only sent when we don't receive data in WAL
receiver.
Part of that commit is the following hunk:
@@ -609,19 +665,25 @@ XLogWalRcvFlush(bool dying)
/* Also let the master know that we made some progress */
if (!dying)
- {
- XLogWalRcvSendReply();
- XLogWalRcvSendHSFeedback();
- }
+ XLogWalRcvSendReply(false, false);
}
}
Before that commit XLogWalRcvFlush, which is in the "busy" path, sent
hot standby feedback. After that only the idle path does so.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Jan 16, 2014 at 2:14 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-01-16 10:26:51 +0530, Amit Kapila wrote:
To a good part that seems to have been introduced in
6f60fdd7015b032bf49273c99f80913d57eac284.I don't see that above commit has introduced this behaviour, as
the code in question seems to be there without commit as well,
means that HS feedback is only sent when we don't receive data in WAL
receiver.Part of that commit is the following hunk:
@@ -609,19 +665,25 @@ XLogWalRcvFlush(bool dying)
/* Also let the master know that we made some progress */ if (!dying) - { - XLogWalRcvSendReply(); - XLogWalRcvSendHSFeedback(); - } + XLogWalRcvSendReply(false, false); } }Before that commit XLogWalRcvFlush, which is in the "busy" path, sent
hot standby feedback. After that only the idle path does so.
Okay, Thanks for pointing out.
This is clearly a problem and I think that part of patch should be reverted.
I have figured out how this has happened, actually during the patch
(Improve replication connection timeouts) development, there was an idea
to merge Reply and Feedback messages during which this call got removed
but later on that doesn't appear to be good way of proceeding, so we decide
not merge the messages. While kicking out merge of messages part, it seems
to me that we forgot to add this back.
Heikki, do you remember any other reason for this problem?
I had referred below thread and the mails following it to find this out:
/messages/by-id/6C0B27F7206C9E4CA54AE035729E9C382853645A@szxeml509-mbs
Attached patch should fix this problem.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
bug_fix_missing_hs_feedback_message.patchapplication/octet-stream; name=bug_fix_missing_hs_feedback_message.patchDownload+3-0
On 01/16/2014 12:34 PM, Amit Kapila wrote:
On Thu, Jan 16, 2014 at 2:14 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-01-16 10:26:51 +0530, Amit Kapila wrote:
To a good part that seems to have been introduced in
6f60fdd7015b032bf49273c99f80913d57eac284.I don't see that above commit has introduced this behaviour, as
the code in question seems to be there without commit as well,
means that HS feedback is only sent when we don't receive data in WAL
receiver.Part of that commit is the following hunk:
@@ -609,19 +665,25 @@ XLogWalRcvFlush(bool dying)
/* Also let the master know that we made some progress */ if (!dying) - { - XLogWalRcvSendReply(); - XLogWalRcvSendHSFeedback(); - } + XLogWalRcvSendReply(false, false); } }Before that commit XLogWalRcvFlush, which is in the "busy" path, sent
hot standby feedback. After that only the idle path does so.Okay, Thanks for pointing out.
This is clearly a problem and I think that part of patch should be reverted.
I have figured out how this has happened, actually during the patch
(Improve replication connection timeouts) development, there was an idea
to merge Reply and Feedback messages during which this call got removed
but later on that doesn't appear to be good way of proceeding, so we decide
not merge the messages. While kicking out merge of messages part, it seems
to me that we forgot to add this back.Heikki, do you remember any other reason for this problem?
I had referred below thread and the mails following it to find this out:
/messages/by-id/6C0B27F7206C9E4CA54AE035729E9C382853645A@szxeml509-mbsAttached patch should fix this problem.
Yeah, that's probably what happened. Applied, thanks!
- Heikki
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 01/16/2014 10:44 AM, Andres Freund wrote:
On 2014-01-16 10:26:51 +0530, Amit Kapila wrote:
Looking into this I also noticed that the busy path is odd, because a)
why are we sending a reply before flushing things to disk? b)
XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().I think call to reply in XLogWalRcvFlush() might not actually send
reply because of time difference of last Reply message which we
sent before flush call.Yes, but most of the time that will lead to only outdated data to be
sent since that's the first call. Hardly a severe issue, odd nonetheless.The only point that occurs to me for having such a code is that
incase flush call fails due to disk space or some other such issue,
it can atleast send the correct write position to primary.Unconvinced.
The reply message contains a pointers for how far the WAL has been
applied, written, and flushed. There can be a significant delay between
the write and flush steps, so we send a separate reply after writing,
and after flushing. (if we didn't, the flush and write pointers sent to
the master would always be the equal).
- Heikki
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2014-01-16 23:24:13 +0200, Heikki Linnakangas wrote:
On 01/16/2014 10:44 AM, Andres Freund wrote:
On 2014-01-16 10:26:51 +0530, Amit Kapila wrote:
Looking into this I also noticed that the busy path is odd, because a)
why are we sending a reply before flushing things to disk? b)
XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().I think call to reply in XLogWalRcvFlush() might not actually send
reply because of time difference of last Reply message which we
sent before flush call.Yes, but most of the time that will lead to only outdated data to be
sent since that's the first call. Hardly a severe issue, odd nonetheless.The only point that occurs to me for having such a code is that
incase flush call fails due to disk space or some other such issue,
it can atleast send the correct write position to primary.Unconvinced.
The reply message contains a pointers for how far the WAL has been applied,
written, and flushed. There can be a significant delay between the write and
flush steps, so we send a separate reply after writing, and after flushing.
(if we didn't, the flush and write pointers sent to the master would always
be the equal).
If we'd always send a message, I'd be convinced by that argument, but
we're only sending messages after a timeout. This means that if syncrep
is on, we will more frequently have explicitly ask the receiver to send
a confirmation since the reply before the flush will have reset the
timers causing the reply after the flush to only infrequently send
something.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 01/16/2014 11:55 PM, Andres Freund wrote:
On 2014-01-16 23:24:13 +0200, Heikki Linnakangas wrote:
The reply message contains a pointers for how far the WAL has been applied,
written, and flushed. There can be a significant delay between the write and
flush steps, so we send a separate reply after writing, and after flushing.
(if we didn't, the flush and write pointers sent to the master would always
be the equal).If we'd always send a message, I'd be convinced by that argument, but
we're only sending messages after a timeout. This means that if syncrep
is on, we will more frequently have explicitly ask the receiver to send
a confirmation since the reply before the flush will have reset the
timers causing the reply after the flush to only infrequently send
something.
No, XLogWalRcvSendReply also sends the reply if the flush or write
pointers have advanced since last reply, even if the timeout hasn't expired.
- Heikki
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2014-01-17 10:50:29 +0200, Heikki Linnakangas wrote:
On 01/16/2014 11:55 PM, Andres Freund wrote:
On 2014-01-16 23:24:13 +0200, Heikki Linnakangas wrote:
The reply message contains a pointers for how far the WAL has been applied,
written, and flushed. There can be a significant delay between the write and
flush steps, so we send a separate reply after writing, and after flushing.
(if we didn't, the flush and write pointers sent to the master would always
be the equal).If we'd always send a message, I'd be convinced by that argument, but
we're only sending messages after a timeout. This means that if syncrep
is on, we will more frequently have explicitly ask the receiver to send
a confirmation since the reply before the flush will have reset the
timers causing the reply after the flush to only infrequently send
something.No, XLogWalRcvSendReply also sends the reply if the flush or write pointers
have advanced since last reply, even if the timeout hasn't expired.
Uh. Yes. I mis-remembered that we weren't doing that, but that's just
the apply value...
Sorry for the noise,
'
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs