pgsql: Make a hard state change from catchup to streaming mode.

Started by Simon Riggsalmost 15 years ago9 messages
#1Simon Riggs
simon@2ndQuadrant.com

Make a hard state change from catchup to streaming mode.
More useful state change for monitoring purposes, plus a
required change for synchronous replication patch.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/bc76695c4c44e21aed94ee2bea81c4cdcedf1390

Modified Files
--------------
src/backend/replication/walsender.c | 23 +++++++++++++++++++++--
1 files changed, 21 insertions(+), 2 deletions(-)

#2Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#1)
Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.

On Fri, Feb 18, 2011 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Make a hard state change from catchup to streaming mode.
More useful state change for monitoring purposes, plus a
required change for synchronous replication patch.

As far as I can see, this patch was not posted or discussed before
commit, and I'm not sure it's the behavior everyone wants. It has the
effect of preventing the system from ever going backwards from
"streaming" to "catchup". Is that what we want?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#2)
Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.

Robert Haas <robertmhaas@gmail.com> wrote:

Simon Riggs <simon@2ndquadrant.com> wrote:

Make a hard state change from catchup to streaming mode.
More useful state change for monitoring purposes, plus a
required change for synchronous replication patch.

As far as I can see, this patch was not posted or discussed before
commit, and I'm not sure it's the behavior everyone wants. It has
the effect of preventing the system from ever going backwards from
"streaming" to "catchup". Is that what we want?

We are looking at moving to streaming replication instead of WAL
file shipping, but we often have WAN outages. These can last
minutes, hours, or even a few days. What would be the impact of
this patch on us during and after such outages?

I don't know how well such experience generalizes, but my personal
experience with replication technology is that "hard state changes"
tend to make things more "clunky" and introduce odd issues at the
state transitions. Where different message types are intermingled
without such hard state changes, I've seen more graceful behavior.

Of course, take that with a grain of salt -- I haven't read the
patch and am talking in generalities based on having written a
couple serious replication tools in the past, and having used a few
others.

-Kevin

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Feb 18, 2011 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Make a hard state change from catchup to streaming mode.
More useful state change for monitoring purposes, plus a
required change for synchronous replication patch.

As far as I can see, this patch was not posted or discussed before
commit, and I'm not sure it's the behavior everyone wants. It has the
effect of preventing the system from ever going backwards from
"streaming" to "catchup". Is that what we want?

That seems like a very bad idea from here. Being able to go back to
catchup after loss of the streaming connection is essential for
robustness. If we now have to restart the slave for that to happen,
it's not an improvement.

regards, tom lane

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#4)
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.

On Fri, 2011-02-18 at 10:41 -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Feb 18, 2011 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Make a hard state change from catchup to streaming mode.
More useful state change for monitoring purposes, plus a
required change for synchronous replication patch.

As far as I can see, this patch was not posted or discussed before
commit, and I'm not sure it's the behavior everyone wants. It has the
effect of preventing the system from ever going backwards from
"streaming" to "catchup". Is that what we want?

That seems like a very bad idea from here. Being able to go back to
catchup after loss of the streaming connection is essential for
robustness. If we now have to restart the slave for that to happen,
it's not an improvement.

If we lose the connection then we start another walsender process, so
that is a non-issue.

We need a hard state change at one particular point to avoid sync rep
requestors from hanging for hours when the standby is connected but a
long way from being caught up.

This was a part of my sync rep patch that it was easier to break out and
commit early. There has never been any comment or objection to this
concept and the same feature has existed in my patches for months.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.

On Fri, Feb 18, 2011 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Feb 18, 2011 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Make a hard state change from catchup to streaming mode.
More useful state change for monitoring purposes, plus a
required change for synchronous replication patch.

As far as I can see, this patch was not posted or discussed before
commit, and I'm not sure it's the behavior everyone wants.  It has the
effect of preventing the system from ever going backwards from
"streaming" to "catchup".  Is that what we want?

That seems like a very bad idea from here.  Being able to go back to
catchup after loss of the streaming connection is essential for
robustness.  If we now have to restart the slave for that to happen,
it's not an improvement.

No, that's not the case where it matters. The state would get reset
on reconnect. The problem is when, say, the master server is
generating WAL at a rate which exceeds the network bandwidth of the
link between the master and the standby. The previous coding will
make us flip back into the catchup state when that happens.

Actually, the old code is awfully sensitive, and knowing that you are
not caught up is not really enough information: you need to know how
far behind you are, and how long you've been behind for. I'm guessing
that Simon intended this patch to deal with that problem, but it's not
the right fix.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#5)
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.

On Fri, Feb 18, 2011 at 10:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

We need a hard state change at one particular point to avoid sync rep
requestors from hanging for hours when the standby is connected but a
long way from being caught up.

That's a matter of opinion. I think there are a number of people here
who would say that what you need is a good way to know when you've
caught up, and that you shouldn't enable sync rep until that happens.
What you're proposing would be useful too, if it didn't break other
cases, but it does. This is precisely why it's a bad idea for us to
be trying to do this kind of engineering at the very last minute.

This was a part of my sync rep patch that it was easier to break out and
commit early. There has never been any comment or objection to this
concept and the same feature has existed in my patches for months.

You posted the latest version of your sync rep patch on January 15th,
after months of inactivity. Heikki reviewed it on the 21st, and there
it sat un-updated for three weeks. If your expectation is that any
portion of that patch to which nobody specifically objected is fair
game to commit without further discussion, I don't think that's the
way it works around here.

Post your patches and we'll review them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#3)
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.

On Fri, 2011-02-18 at 09:41 -0600, Kevin Grittner wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

Simon Riggs <simon@2ndquadrant.com> wrote:

Make a hard state change from catchup to streaming mode.
More useful state change for monitoring purposes, plus a
required change for synchronous replication patch.

As far as I can see, this patch was not posted or discussed before
commit, and I'm not sure it's the behavior everyone wants. It has
the effect of preventing the system from ever going backwards from
"streaming" to "catchup". Is that what we want?

We are looking at moving to streaming replication instead of WAL
file shipping, but we often have WAN outages. These can last
minutes, hours, or even a few days. What would be the impact of
this patch on us during and after such outages?

None at all. The patch introduces no behavioural changes, only a useful,
but minor re-categorisation of what is already happening so that its
easier to monitor what happens following startup of a standby.

I don't know how well such experience generalizes, but my personal
experience with replication technology is that "hard state changes"
tend to make things more "clunky" and introduce odd issues at the
state transitions. Where different message types are intermingled
without such hard state changes, I've seen more graceful behavior.

Of course, take that with a grain of salt -- I haven't read the
patch and am talking in generalities based on having written a
couple serious replication tools in the past, and having used a few
others.

I respect your experience.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#9Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#6)
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.

On Fri, Feb 18, 2011 at 11:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 18, 2011 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Feb 18, 2011 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Make a hard state change from catchup to streaming mode.
More useful state change for monitoring purposes, plus a
required change for synchronous replication patch.

As far as I can see, this patch was not posted or discussed before
commit, and I'm not sure it's the behavior everyone wants.  It has the
effect of preventing the system from ever going backwards from
"streaming" to "catchup".  Is that what we want?

That seems like a very bad idea from here.  Being able to go back to
catchup after loss of the streaming connection is essential for
robustness.  If we now have to restart the slave for that to happen,
it's not an improvement.

No, that's not the case where it matters.  The state would get reset
on reconnect.  The problem is when, say, the master server is
generating WAL at a rate which exceeds the network bandwidth of the
link between the master and the standby.  The previous coding will
make us flip back into the catchup state when that happens.

Actually, the old code is awfully sensitive, and knowing that you are
not caught up is not really enough information: you need to know how
far behind you are, and how long you've been behind for.  I'm guessing
that Simon intended this patch to deal with that problem, but it's not
the right fix.

So am I the only one who thinks this needs to be reverted?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company