Standby catch up state change

Started by Pavan Deolaseeabout 12 years ago9 messages

pavan.deolasee@gmail.com

about 12 years ago

Hello,

I wonder if there is an issue with the way state change happens
from WALSNDSTATE_CATCHUP to WALSNDSTATE_STREAMING. Please note my question
is solely based on a strange behavior reported by a colleague and my
limited own code reading. The colleague is trying out replication with a
networking middleware and noticed that the master logs the debug message
about standby catching up, but the write_location in the
pg_stat_replication view takes minutes to reflect the actual catch up
location.

ISTM that the following code in walsender.c assumes that the standby has
caught up once master sends all the required WAL.

1548 /* Do we have any work to do? */
1549 Assert(sentPtr <= SendRqstPtr);
1550 if (SendRqstPtr <= sentPtr)
1551 {
1552 *caughtup = true;
1553 return;
1554 }

But what if the standby has not yet received all the WAL data sent by the
master ? It can happen for various reasons such as caching at the OS level
or the network layer on the sender machine or any other intermediate hops.

Should we not instead wait for the standby to have received all the WAL
before declaring that it has caught up ? If a failure happens while the
data is still in the sender's buffer, the standby may not actually catch up
to the desired point contrary to the LOG message displayed on the master.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Andres Freund

andres@2ndquadrant.com

about 12 years ago

In reply to: Pavan Deolasee (#1)

Re: Standby catch up state change

On 2013-10-15 15:51:46 +0530, Pavan Deolasee wrote:

Should we not instead wait for the standby to have received all the WAL
before declaring that it has caught up ? If a failure happens while the
data is still in the sender's buffer, the standby may not actually catch up
to the desired point contrary to the LOG message displayed on the master.

I don't think that'd be a good idea - the "caughtup" logic is used to
determine whether we need to wait for further wal to be generated
locally if we haven't got anything else to do. And we only need to do so
when we reached the end of the WAL.

Also, we'd have to reset caughtup everytime we send data (in
XLogSend()), that'd be horrible.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavan Deolasee

pavan.deolasee@gmail.com

about 12 years ago

In reply to: Andres Freund (#2)

Re: Standby catch up state change

On Tue, Oct 15, 2013 at 3:59 PM, Andres Freund <andres@2ndquadrant.com>wrote:

I don't think that'd be a good idea - the "caughtup" logic is used to
determine whether we need to wait for further wal to be generated
locally if we haven't got anything else to do. And we only need to do so
when we reached the end of the WAL.

Obviously I do not understand the logic caughtup fully, but don't you think
the log message about standby having caught up with master while it hasn't
because the sender has buffered a lot of data, is wrong ? Or are you saying
those are two different things really ?

Also, we'd have to reset caughtup everytime we send data (in
XLogSend()), that'd be horrible.

Sorry, I did not get that. I was only arguing that the log message about
standby having caught up with master should be delayed until standby has
actually received the WAL, not much about the actual implementation.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Andres Freund

andres@2ndquadrant.com

about 12 years ago

In reply to: Pavan Deolasee (#3)

Re: Standby catch up state change

On 2013-10-15 16:12:56 +0530, Pavan Deolasee wrote:

On Tue, Oct 15, 2013 at 3:59 PM, Andres Freund <andres@2ndquadrant.com>wrote:

I don't think that'd be a good idea - the "caughtup" logic is used to
determine whether we need to wait for further wal to be generated
locally if we haven't got anything else to do. And we only need to do so
when we reached the end of the WAL.

Obviously I do not understand the logic caughtup fully, but don't you think
the log message about standby having caught up with master while it hasn't
because the sender has buffered a lot of data, is wrong ? Or are you saying
those are two different things really ?

The message is logged when the state changes because the state is
important for the behaviour of replication (e.g. that node becomes
elegible for sync rep). I don't think delaying the message is a good
idea.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavan Deolasee

pavan.deolasee@gmail.com

about 12 years ago

In reply to: Andres Freund (#4)

Re: Standby catch up state change

On Tue, Oct 15, 2013 at 4:16 PM, Andres Freund <andres@2ndquadrant.com>wrote:

I don't think delaying the message is a good
idea.

Comment in walsender.c says:

/*
* If we're in catchup state, move to streaming. This is an
* important state change for users to know about, since before
* this point data loss might occur if the primary dies and we
* need to failover to the standby.
*/

IOW it claims no data loss will occur after this point. But if the WAL is
cached on the master side, isn't this a false claim i.e. the data loss can
still occur even after master outputs the log message and changes the state
to streaming. Or am I still getting it wrong ?

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Andres Freund

andres@2ndquadrant.com

about 12 years ago

In reply to: Pavan Deolasee (#5)

Re: Standby catch up state change

On 2013-10-15 16:29:47 +0530, Pavan Deolasee wrote:

On Tue, Oct 15, 2013 at 4:16 PM, Andres Freund <andres@2ndquadrant.com>wrote:

I don't think delaying the message is a good
idea.

Comment in walsender.c says:

/*
* If we're in catchup state, move to streaming. This is an
* important state change for users to know about, since before
* this point data loss might occur if the primary dies and we
* need to failover to the standby.
*/

IOW it claims no data loss will occur after this point. But if the WAL is
cached on the master side, isn't this a false claim i.e. the data loss can
still occur even after master outputs the log message and changes the state
to streaming. Or am I still getting it wrong ?

I think you're over-intrepreting it. We don't actually rely on the data
being confirmed received anywhere. And the message doesn't say anything
about everything safely being written out.
So, if you want to adjust that comment, go for it, but I am pretty
firmly confirmed that this isn't worth changing logic.

Note that the ready_to_stop logic *does* make sure everything's flushed.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavan Deolasee

pavan.deolasee@gmail.com

about 12 years ago

In reply to: Andres Freund (#6)

Re: Standby catch up state change

On Tue, Oct 15, 2013 at 4:51 PM, Andres Freund <andres@2ndquadrant.com>wrote:

I think you're over-intrepreting it.

I think you are right. Someone who understands the replication code very
well advised us to use that log message as a way to measure how much time
it takes to send all the missing WAL to a remote standby on a slow WAN
link. While it worked well for all measurements, when we use a middleware
which caches a lot of traffic on the sender side, this log message was very
counter intuitive. It took several more minutes for the standby to actually
receive all the WAL files and catch up after the message was displayed on
the master side. But then as you said, may be relying on the message was
not the best way to measure the time.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Andres Freund

andres@2ndquadrant.com

about 12 years ago

In reply to: Pavan Deolasee (#7)

Re: Standby catch up state change

On 2013-10-16 11:03:12 +0530, Pavan Deolasee wrote:

I think you are right. Someone who understands the replication code very
well advised us to use that log message as a way to measure how much time
it takes to send all the missing WAL to a remote standby on a slow WAN
link. While it worked well for all measurements, when we use a middleware
which caches a lot of traffic on the sender side, this log message was very
counter intuitive. It took several more minutes for the standby to actually
receive all the WAL files and catch up after the message was displayed on
the master side. But then as you said, may be relying on the message was
not the best way to measure the time.

Query pg_stat_replication instead, that has the flush position.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavan Deolasee

pavan.deolasee@gmail.com

about 12 years ago

In reply to: Andres Freund (#8)

Re: Standby catch up state change

On 16-Oct-2013, at 3:45 pm, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-10-16 11:03:12 +0530, Pavan Deolasee wrote:
I think you are right. Someone who understands the replication code very
well advised us to use that log message as a way to measure how much time
it takes to send all the missing WAL to a remote standby on a slow WAN
link. While it worked well for all measurements, when we use a middleware
which caches a lot of traffic on the sender side, this log message was very
counter intuitive. It took several more minutes for the standby to actually
receive all the WAL files and catch up after the message was displayed on
the master side. But then as you said, may be relying on the message was
not the best way to measure the time.

Query pg_stat_replication instead, that has the flush position.

Yeah, that's what we are doing now.

Thanks,
Pavan

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers