time-delayed standbys
While I was out at the MySQL conference last week, I heard that one of
the forthcoming MySQL features is "time-delayed replication":
http://forge.mysql.com/worklog/task.php?id=344
That is, a standby configured such that replay lags a prescribed
amount of time behind the master.
This seemed easy to implement, so I did. Patch (for 9.2, obviously) attached.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
time-delayed-standby.patchapplication/octet-stream; name=time-delayed-standby.patchDownload+95-2
On Wed, Apr 20, 2011 at 3:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
While I was out at the MySQL conference last week, I heard that one of
the forthcoming MySQL features is "time-delayed replication":
Incidentally, this is a popular Oracle feature. It's a poor man's
"flashback" and similar to how some filesystems automatically create
regular snapshots of every home directory so you can get back stuff at
some arbitrary point in the past.
I haven't read the patch but are you delaying delivering the log or
delaying replaying it? I think you actually want the latter so in case
of a real failure you can choose between replaying the last 5 minutes
and recovering everything or intentionally dropping that history if
the failure was caused by an application problem.
--
greg
Hello
yesterday I was asked about this feature
+1
Regards
Pavel
2011/4/20 Robert Haas <robertmhaas@gmail.com>:
Show quoted text
While I was out at the MySQL conference last week, I heard that one of
the forthcoming MySQL features is "time-delayed replication":http://forge.mysql.com/worklog/task.php?id=344
That is, a standby configured such that replay lags a prescribed
amount of time behind the master.This seemed easy to implement, so I did. Patch (for 9.2, obviously) attached.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 20, 2011 at 12:24 PM, Greg Stark <gsstark@mit.edu> wrote:
I haven't read the patch but are you delaying delivering the log or
delaying replaying it? I think you actually want the latter so in case
of a real failure you can choose between replaying the last 5 minutes
and recovering everything or intentionally dropping that history if
the failure was caused by an application problem.
Ok, so I read the patch. That wasn't so hard.
You did implement it the way I described so great.
However you did it by comparing the server timestamp and the slave
timestamp. I think we should avoid that. Synchronizing clocks is a
surprisingly hard problem and it would really be great if the whole
system kept working if the clocks were out of sync.
I don't think that's impossible, though it's a bit inconvenient. We
would need to keep track of the most recent timestamp received from
the master and pause recovery if we reach recovery on a record that's
less than five minutes older than that record. This means we're
comparing only master timestamps to each other. It does mean if the
log transfer lags then recovery will lag unnecessarily. And it means
the process of receiving logs is complicated by having to keep track
of this timestamp.
--
greg
Greg Stark <gsstark@mit.edu> writes:
On Wed, Apr 20, 2011 at 3:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
While I was out at the MySQL conference last week, I heard that one of
the forthcoming MySQL features is "time-delayed replication":Incidentally, this is a popular Oracle feature. It's a poor man's
"flashback" and similar to how some filesystems automatically create
regular snapshots of every home directory so you can get back stuff at
some arbitrary point in the past.
Yup.
One of the big bosses where I work went asking for this a couple years
ago. We're multi-platform; Oracle, MySQL, EnterpriseDB, Vertica.
They put a 6 hour delay on the critical Oracle boxes. NOthing was done
for MySQL or Vertica since no feature support.
My C-foo being rusty, I elected to add more hacks to our home-spun
PYthon version of pg_standby rather than adding the feature to
pg_standby itself.
Been running with delayed WAL apply ever since.. Shipping happens
immediatly on log creation at master and we simply wait for the files to
age the configured time before application.
In a few cases, we have 2 or more standbys off the same prod master.
One in real-time mode and the others lagged.
Thanks all!
--
Jerry Sievers
Postgres DBA/Development Consulting
e: gsievers19@comcast.net
p: 305.321.1144
On Wed, Apr 20, 2011 at 7:42 AM, Greg Stark <gsstark@mit.edu> wrote:
On Wed, Apr 20, 2011 at 12:24 PM, Greg Stark <gsstark@mit.edu> wrote:
I haven't read the patch but are you delaying delivering the log or
delaying replaying it? I think you actually want the latter so in case
of a real failure you can choose between replaying the last 5 minutes
and recovering everything or intentionally dropping that history if
the failure was caused by an application problem.Ok, so I read the patch. That wasn't so hard.
You did implement it the way I described so great.
However you did it by comparing the server timestamp and the slave
timestamp. I think we should avoid that. Synchronizing clocks is a
surprisingly hard problem and it would really be great if the whole
system kept working if the clocks were out of sync.I don't think that's impossible, though it's a bit inconvenient. We
would need to keep track of the most recent timestamp received from
the master and pause recovery if we reach recovery on a record that's
less than five minutes older than that record. This means we're
comparing only master timestamps to each other. It does mean if the
log transfer lags then recovery will lag unnecessarily. And it means
the process of receiving logs is complicated by having to keep track
of this timestamp.
How would we keep track of the most recent timestamp received from the
master without replaying the WAL records?
What happens if the connection to the master is lost?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Apr 20, 2011 at 2:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
How would we keep track of the most recent timestamp received from the
master without replaying the WAL records?
Well as we receive them we would have to peek at them to see the time.
Or we could have the master send its time to the slave as I believe
Tom has already proposed in the past but I believe didn't choose to
do? I haven't really been following all the twists and turns of
streaming replication and sync replication. I read the e-mails but
fell behind a few times so I'm not sure what the end result was
sometimes.
What happens if the connection to the master is lost?
We're writing these records to disk no? I don't understand the
problem. It doesn't even matter if the records get lost and have to be
retrieved again, as long as we gate the recovery of the records based
on a time we know the master has seen sometime.
Oh, I think I see what you're getting at. We would never replay the
last five minutes if there's no connection. We definitely want to make
sure the admin is aware of the lag and can disable this feature before
bringing up the slave.
--
greg
Greg Stark <gsstark@mit.edu> writes:
On Wed, Apr 20, 2011 at 2:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
How would we keep track of the most recent timestamp received from the
master without replaying the WAL records?
Well as we receive them we would have to peek at them to see the time.
Or we could have the master send its time to the slave as I believe
Tom has already proposed in the past but I believe didn't choose to
do?
My idea of how to manage it would be to have walreceiver explicitly
track the clock difference from the master, which it can do since
walsender puts its current time into every message header. You can use
the slave's clock for comparisons so long as you add the appropriate
offset. (The theory here is that the skew won't change very fast ...)
regards, tom lane
On Wed, Apr 20, 2011 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My idea of how to manage it would be to have walreceiver explicitly
track the clock difference from the master, which it can do since
walsender puts its current time into every message header. You can use
the slave's clock for comparisons so long as you add the appropriate
offset. (The theory here is that the skew won't change very fast ...)
Ah, so we did put the master's clock in every message? Then this
should be simple, no? Just compare the master's timestamp from the
record to the last master's clock seen in the messages. That sounds
equivalent but a lot safer than trying to keep a conversion between
them.
--
greg
Greg Stark <gsstark@mit.edu> writes:
Ah, so we did put the master's clock in every message?
Yes, we did.
Then this
should be simple, no? Just compare the master's timestamp from the
record to the last master's clock seen in the messages. That sounds
equivalent but a lot safer than trying to keep a conversion between
them.
Well, the question is what happens after you stop receiving master
messages. If you don't make use of the slave's clock somehow,
application of WAL will stop dead in the water, which seems unlikely
to be what's wanted.
regards, tom lane
On Wed, Apr 20, 2011 at 10:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Stark <gsstark@mit.edu> writes:
On Wed, Apr 20, 2011 at 2:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
How would we keep track of the most recent timestamp received from the
master without replaying the WAL records?Well as we receive them we would have to peek at them to see the time.
Or we could have the master send its time to the slave as I believe
Tom has already proposed in the past but I believe didn't choose to
do?My idea of how to manage it would be to have walreceiver explicitly
track the clock difference from the master, which it can do since
walsender puts its current time into every message header. You can use
the slave's clock for comparisons so long as you add the appropriate
offset. (The theory here is that the skew won't change very fast ...)
I am a bit concerned about the reliability of this approach. If there
is some network lag, or some lag in processing from the master, we
could easily get the idea that there is time skew between the machines
when there really isn't. And our perception of the time skew could
easily bounce around from message to message, as the lag varies. I
think it would be tremendously ironic of the two machines were
actually synchronized to the microsecond, but by trying to be clever
about it we managed to make the lag-time accurate only to within
several seconds.
It looks like the MySQL version of this feature is spec'd to have some
sort of time skew compensation built into it, but I don't see anything
that explains how it actually works.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I am a bit concerned about the reliability of this approach. If there
is some network lag, or some lag in processing from the master, we
could easily get the idea that there is time skew between the machines
when there really isn't. And our perception of the time skew could
easily bounce around from message to message, as the lag varies. I
think it would be tremendously ironic of the two machines were
actually synchronized to the microsecond, but by trying to be clever
about it we managed to make the lag-time accurate only to within
several seconds.
Well, if walreceiver concludes that there is no more than a few seconds'
difference between the clocks, it'd probably be OK to take the master
timestamps at face value. The problem comes when the skew gets large
(compared to the configured time delay, I guess).
regards, tom lane
On Wed, Apr 20, 2011 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Stark <gsstark@mit.edu> writes:
Ah, so we did put the master's clock in every message?
Yes, we did.
And by "we" I mean "you".... I realize I'm tossing in comments from
the peanut gallery to you and especially Robert who worked on this
stuff a lot already.
Then this
should be simple, no? Just compare the master's timestamp from the
record to the last master's clock seen in the messages. That sounds
equivalent but a lot safer than trying to keep a conversion between
them.Well, the question is what happens after you stop receiving master
messages. If you don't make use of the slave's clock somehow,
application of WAL will stop dead in the water, which seems unlikely
to be what's wanted.
I'm not convinced that's so bad. But even so the logic could be:
wait until (master.last_time_seen > this_record.master-timestamp+n minutes ||
gettimeofday() > this_record.local_time_when_received+n minutes)
--
greg
On Wed, Apr 20, 2011 at 11:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I am a bit concerned about the reliability of this approach. If there
is some network lag, or some lag in processing from the master, we
could easily get the idea that there is time skew between the machines
when there really isn't. And our perception of the time skew could
easily bounce around from message to message, as the lag varies. I
think it would be tremendously ironic of the two machines were
actually synchronized to the microsecond, but by trying to be clever
about it we managed to make the lag-time accurate only to within
several seconds.Well, if walreceiver concludes that there is no more than a few seconds'
difference between the clocks, it'd probably be OK to take the master
timestamps at face value. The problem comes when the skew gets large
(compared to the configured time delay, I guess).
I suppose. Any bound on how much lag there can be before we start
applying to skew correction is going to be fairly arbitrary.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Apr 19, 2011 at 9:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
That is, a standby configured such that replay lags a prescribed
amount of time behind the master.This seemed easy to implement, so I did. Patch (for 9.2, obviously) attached.
This crashes when stoping recovery to a target (i tried with a named
restore point and with a poin in time) after executing
pg_xlog_replay_resume(). here is the backtrace. I will try to check
later but i wanted to report it before...
#0 0xb7777537 in raise () from /lib/libc.so.6
#1 0xb777a922 in abort () from /lib/libc.so.6
#2 0x08393a19 in errfinish (dummy=0) at elog.c:513
#3 0x083944ba in elog_finish (elevel=22, fmt=0x83d5221 "wal receiver
still active") at elog.c:1156
#4 0x080f04cb in StartupXLOG () at xlog.c:6691
#5 0x080f2825 in StartupProcessMain () at xlog.c:10050
#6 0x0811468f in AuxiliaryProcessMain (argc=2, argv=0xbfa326a8) at
bootstrap.c:417
#7 0x0827c2ea in StartChildProcess (type=StartupProcess) at postmaster.c:4488
#8 0x08280b85 in PostmasterMain (argc=3, argv=0xa4c17e8) at postmaster.c:1106
#9 0x0821730f in main (argc=3, argv=0xa4c17e8) at main.c:199
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Sat, Apr 23, 2011 at 9:46 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
On Tue, Apr 19, 2011 at 9:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
That is, a standby configured such that replay lags a prescribed
amount of time behind the master.This seemed easy to implement, so I did. Patch (for 9.2, obviously) attached.
This crashes when stoping recovery to a target (i tried with a named
restore point and with a poin in time) after executing
pg_xlog_replay_resume(). here is the backtrace. I will try to check
later but i wanted to report it before...#0 0xb7777537 in raise () from /lib/libc.so.6
#1 0xb777a922 in abort () from /lib/libc.so.6
#2 0x08393a19 in errfinish (dummy=0) at elog.c:513
#3 0x083944ba in elog_finish (elevel=22, fmt=0x83d5221 "wal receiver
still active") at elog.c:1156
#4 0x080f04cb in StartupXLOG () at xlog.c:6691
#5 0x080f2825 in StartupProcessMain () at xlog.c:10050
#6 0x0811468f in AuxiliaryProcessMain (argc=2, argv=0xbfa326a8) at
bootstrap.c:417
#7 0x0827c2ea in StartChildProcess (type=StartupProcess) at postmaster.c:4488
#8 0x08280b85 in PostmasterMain (argc=3, argv=0xa4c17e8) at postmaster.c:1106
#9 0x0821730f in main (argc=3, argv=0xa4c17e8) at main.c:199
Sorry for the slow response on this - I was on vacation for a week and
my schedule got a big hole in it.
I was able to reproduce something very like this in unpatched master,
just by letting recovery pause at a named restore point, and then
resuming it.
LOG: recovery stopping at restore point "stop", time 2011-05-07
09:28:01.652958-04
LOG: recovery has paused
HINT: Execute pg_xlog_replay_resume() to continue.
(at this point I did pg_xlog_replay_resume())
LOG: redo done at 0/5000020
PANIC: wal receiver still active
LOG: startup process (PID 38762) was terminated by signal 6: Abort trap
LOG: terminating any other active server processes
I'm thinking that this code is wrong:
if (recoveryPauseAtTarget && standbyState ==
STANDBY_SNAPSHOT_READY)
{
SetRecoveryPause(true);
recoveryPausesHere();
}
reachedStopPoint = true; /* see below */
recoveryContinue = false;
I think that recoveryContinue = false assignment should not happen if
we decide to pause. That is, we should say if (recoveryPauseAtTarget
&& standbyState == STANDBY_SNAPSHOT_READY) { same as now } else
recoveryContinue = false.
I haven't tested that, though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, May 7, 2011 at 10:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I was able to reproduce something very like this in unpatched master,
just by letting recovery pause at a named restore point, and then
resuming it.
I was able to reproduce the same problem even in 9.0. When the standby
reaches the recovery target, it always tries to end the recovery even
though walreceiver is still running, which causes the problem. This seems
to be an oversight in streaming replication. I should have considered how
the standby should work when recovery_target is specified.
What about the attached patch? Which stops walreceiver instead of
emitting PANIC there only if we've reached the recovery target.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
recovery_target_v1.patchapplication/octet-stream; name=recovery_target_v1.patchDownload+15-15
On 07.05.2011 16:48, Robert Haas wrote:
I was able to reproduce something very like this in unpatched master,
just by letting recovery pause at a named restore point, and then
resuming it.LOG: recovery stopping at restore point "stop", time 2011-05-07
09:28:01.652958-04
LOG: recovery has paused
HINT: Execute pg_xlog_replay_resume() to continue.
(at this point I did pg_xlog_replay_resume())
LOG: redo done at 0/5000020
PANIC: wal receiver still active
LOG: startup process (PID 38762) was terminated by signal 6: Abort trap
LOG: terminating any other active server processesI'm thinking that this code is wrong:
if (recoveryPauseAtTarget&& standbyState ==
STANDBY_SNAPSHOT_READY)
{
SetRecoveryPause(true);
recoveryPausesHere();
}
reachedStopPoint = true; /* see below */
recoveryContinue = false;I think that recoveryContinue = false assignment should not happen if
we decide to pause. That is, we should say if (recoveryPauseAtTarget
&& standbyState == STANDBY_SNAPSHOT_READY) { same as now } else
recoveryContinue = false.
No, recovery stops at that point whether or not you pause. Resuming
after stopping at the recovery target doesn't mean that you resume
recovery, it means that you resume to end recovery and start up the
server (see the 2nd to last paragraph at
http://www.postgresql.org/docs/9.1/static/recovery-target-settings.html). It
would probably be more useful to allow a new stopping target to be set
and continue recovery, but the current pause/resume functions don't
allow that.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Wed, May 11, 2011 at 6:50 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
I think we can just always call ShutdownWalRcv(). It should be gone if the
server was promoted while streaming, but that's just an implementation
detail of what the promotion code does. There's no hard reason why it
shouldn't be running at that point anymore, as long as we kill it before
going any further.
Okay. But I'd like to add the following assertion check just before
ShutdownWalRcv() which you added, in order to detect such a bug
that we found this time, i.e., the bug which causes unexpected end
of recovery. Thought?
Assert(reachedStopPoint || !WalRcvInProgress())
Committed a patch to do that.
Thanks. Should we backport it to 9.0? 9.0 has the same problem.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Import Notes
Reply to msg id not found: 4DCA5BEC.2020805@enterprisedb.com
On 11.05.2011 14:16, Fujii Masao wrote:
On Wed, May 11, 2011 at 6:50 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:I think we can just always call ShutdownWalRcv(). It should be gone if the
server was promoted while streaming, but that's just an implementation
detail of what the promotion code does. There's no hard reason why it
shouldn't be running at that point anymore, as long as we kill it before
going any further.Okay. But I'd like to add the following assertion check just before
ShutdownWalRcv() which you added, in order to detect such a bug
that we found this time, i.e., the bug which causes unexpected end
of recovery. Thought?Assert(reachedStopPoint || !WalRcvInProgress())
There's no unexpected end of recovery here. The recovery ends when we
reach the target, as it should. It was the assumption that WAL receiver
can't be running at that point anymore that was wrong.
That assertion would work, AFAICS, but I don't think it's something we
need to assert. There isn't any harm done if WAL receiver is still
running, as long as we shut it down at that point.
Committed a patch to do that.
Thanks. Should we backport it to 9.0? 9.0 has the same problem.
Ah, thanks, missed that, Cherry-picked to 9.0 now as well.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com