The way to know whether the standby has caught up with the master
Hi,
For reliable high-availability, when the master crashes, the clusterware must
know whether it can promote the standby safely without any data loss,
before actually promoting it. IOW, it must know whether the standby has
already caught up with the primary. Otherwise, failover might cause data loss.
We can know that from pg_stat_replication on the master. But the problem
is that pg_stat_replication is not available since the master is not running at
that moment. So that info should be available also on the standby.
To achieve that, I'm thinking to change walsender so that, when the standby
has caught up with the master, it sends back the message indicating that to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.
Thought?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On 25.05.2011 07:42, Fujii Masao wrote:
For reliable high-availability, when the master crashes, the clusterware must
know whether it can promote the standby safely without any data loss,
before actually promoting it. IOW, it must know whether the standby has
already caught up with the primary. Otherwise, failover might cause data loss.
We can know that from pg_stat_replication on the master. But the problem
is that pg_stat_replication is not available since the master is not running at
that moment. So that info should be available also on the standby.To achieve that, I'm thinking to change walsender so that, when the standby
has caught up with the master, it sends back the message indicating that to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.
By the time the standby has received that message, it might not be
caught-up anymore because new WAL might've been generated in the master
already.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
On 25.05.2011 07:42, Fujii Masao wrote:
For reliable high-availability, when the master crashes, the clusterware
must
know whether it can promote the standby safely without any data loss,
before actually promoting it. IOW, it must know whether the standby has
already caught up with the primary. Otherwise, failover might cause data
loss.
We can know that from pg_stat_replication on the master. But the problem
is that pg_stat_replication is not available since the master is not
running at
that moment. So that info should be available also on the standby.To achieve that, I'm thinking to change walsender so that, when the
standby
has caught up with the master, it sends back the message indicating that
to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.By the time the standby has received that message, it might not be caught-up
anymore because new WAL might've been generated in the master already.
Right. But, thanks to sync rep, until such a new WAL has been replicated to
the standby, the commit of transaction is not visible to the client. So, even if
there are some WAL not replicated to the standby, the clusterware can promote
the standby safely without any data loss (to the client point of view), I think.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Wed, May 25, 2011 at 12:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas
To achieve that, I'm thinking to change walsender so that, when the
standby
has caught up with the master, it sends back the message indicating that
to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.By the time the standby has received that message, it might not be caught-up
anymore because new WAL might've been generated in the master already.Right. But, thanks to sync rep, until such a new WAL has been replicated to
the standby, the commit of transaction is not visible to the client. So, even if
there are some WAL not replicated to the standby, the clusterware can promote
the standby safely without any data loss (to the client point of view), I think.
then, you also need to transmit to the standby if it is the current
sync standby.
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Wed, May 25, 2011 at 6:16 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
To achieve that, I'm thinking to change walsender so that, when the
standby
has caught up with the master, it sends back the message indicating that
to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.By the time the standby has received that message, it might not be caught-up
anymore because new WAL might've been generated in the master already.
AFAICS, this is an exact case of the Byzantine General's problem.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
On 25.05.2011 07:42, Fujii Masao wrote:
To achieve that, I'm thinking to change walsender so that, when the standby
has caught up with the master, it sends back the message indicating that to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.
By the time the standby has received that message, it might not be
caught-up anymore because new WAL might've been generated in the master
already.
Even assuming that you believe this is a useful capability, there is no
need to change walsender. It *already* sends the current-end-of-WAL in
every message, which indicates precisely whether the message contains
all of available WAL data.
regards, tom lane
On Wed, May 25, 2011 at 12:34:59PM +0100, Simon Riggs wrote:
On Wed, May 25, 2011 at 6:16 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:To achieve that, I'm thinking to change walsender so that, when the
standby
has caught up with the master, it sends back the message indicating that
to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.By the time the standby has received that message, it might not be caught-up
anymore because new WAL might've been generated in the master already.AFAICS, this is an exact case of the Byzantine General's problem.
Have they updated it to acknowledge that the city is now called
Istanbul?
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, May 25, 2011 at 3:11 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
On Wed, May 25, 2011 at 12:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas
By the time the standby has received that message, it might not be caught-up
anymore because new WAL might've been generated in the master already.Right. But, thanks to sync rep, until such a new WAL has been replicated to
the standby, the commit of transaction is not visible to the client. So, even if
there are some WAL not replicated to the standby, the clusterware can promote
the standby safely without any data loss (to the client point of view), I think.then, you also need to transmit to the standby if it is the current
sync standby.
Yes. After further thought, we can promote the standby safely only when the
corresponding walsender meets the following conditions:
1. sync_state is "sync"
2. the standby's flush_location is bigger than or equal to the smallest wait
location in the sync rep queue. Which guarantees that all the committed
transactions (i.e., their "success" indications have been
returned to the
client) have been replicated to the standby.
Once the above conditions get satisfied, the failover is safe until sync_state
is flipped to "async". By using this logic, walsender needs to check whether
failover is safe, and send the message according to the result.
One problem is that, when sync_state is flipped to "async", walsender might
perform replication asynchronously before the standby receives the message
indicating failover is unsafe. In this case, if the master crashes,
the clusterware
would wrongly think that failover is safe and promote the standby despite
which causes data loss.
To solve this problem, walsender would need to send that message
*synchronously*,
i.e., wait for the ACK of the message to arrive from the standby before actually
changing sync_state to "async".
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Wed, May 25, 2011 at 11:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
On 25.05.2011 07:42, Fujii Masao wrote:
To achieve that, I'm thinking to change walsender so that, when the standby
has caught up with the master, it sends back the message indicating that to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.By the time the standby has received that message, it might not be
caught-up anymore because new WAL might've been generated in the master
already.Even assuming that you believe this is a useful capability, there is no
need to change walsender. It *already* sends the current-end-of-WAL in
every message, which indicates precisely whether the message contains
all of available WAL data.
That's not enough to calculate whether failover is safe or not. Even if the
standby's flush location is equal to the master's current end location, new
WAL might have already been generated, and the "success" indication of
the corresponding transaction might have been returned to the client (this
is possible only when async mode). So in addition to the master's current
end location, the standby must know its sync mode, which walsender would
need to send.
Another problem is that, when we can safely promote the standby, the
standby's flush location isn't always equal to the master's current end
location. Imagine the case where there are some unsent WAL in the master
and corresponding transactions are waiting for replication. In this case,
obviously those locations are not the same. But in sync replication, we can
guarantee that all the committed (from the client's view) transactions have
been replicated to the standby, so failover is safe.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center