Hot Standby on git
Just a note to say that Hot Standby patch is now on git repository
git://git.postgresql.org/git/users/simon/postgres
Branch name: hot_standby
The complete contents of that repository are BSD licenced contributions
to the PostgreSQL project.
Any further changes to that will be by agreement here on hackers. From
now, I will be submitting each individual change as patch-on-patch to
allow people to see and discuss them and to confirm them as open source
contributions. I request anybody else interested to do the same to allow
us to work together. All contributions welcome.
My record of agreed changes is here
http://wiki.postgresql.org/wiki/Hot_Standby#Remaining_Work_Items
You'll notice that I've already completed 8 changes (10 commits); those
are all fairly minor changes, so submitted here as a combined patch.
There are 9 pending changes, so far, none of which appear to be major
obstacles to resolve. Many thanks to Heikki for a thorough review which
has identified nearly all of those change requests.
I estimate that making the remaining changes noted on the Wiki and fully
testing them will take at least 2 weeks. Gabriele Bartolini is assisting
in this area, though neither of us are able to work full time on this.
We still have ample time to complete the project in this release.
Many thanks to Magnus and Aidan for helping me resolve my git-wrestling
contest and apologies for the delay while that bout happened.
--
Simon Riggs www.2ndQuadrant.com
Attachments:
hs.53c1eda..13a0bd6.patchtext/x-patch; charset=UTF-8; name=hs.53c1eda..13a0bd6.patchDownload+132-131
On Sat, Sep 26, 2009 at 5:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Just a note to say that Hot Standby patch is now on git repository
git://git.postgresql.org/git/users/simon/postgres
Branch name: hot_standby
Awesome! Thanks for taking the time to get this set up.
The complete contents of that repository are BSD licenced contributions
to the PostgreSQL project.
Excellent...
Any further changes to that will be by agreement here on hackers. From
now, I will be submitting each individual change as patch-on-patch to
allow people to see and discuss them and to confirm them as open source
contributions.
Sounds good.
I request anybody else interested to do the same to allow
us to work together. All contributions welcome.My record of agreed changes is here
http://wiki.postgresql.org/wiki/Hot_Standby#Remaining_Work_ItemsYou'll notice that I've already completed 8 changes (10 commits); those
are all fairly minor changes, so submitted here as a combined patch.
There are 9 pending changes, so far, none of which appear to be major
obstacles to resolve. Many thanks to Heikki for a thorough review which
has identified nearly all of those change requests.I estimate that making the remaining changes noted on the Wiki and fully
testing them will take at least 2 weeks. Gabriele Bartolini is assisting
in this area, though neither of us are able to work full time on this.
We still have ample time to complete the project in this release.
Sounds like you are making rapid progress. If you think there's
something useful I could do, let me know and I'll take a look.
...Robert
On Sat, 2009-09-26 at 09:29 -0400, Robert Haas wrote:
I estimate that making the remaining changes noted on the Wiki and
fully
testing them will take at least 2 weeks. Gabriele Bartolini is assisting
in this area, though neither of us are able to work full time on this.
We still have ample time to complete the project in this release.Sounds like you are making rapid progress.
Only because Heikki has put so much effort into review and because most
of the remaining issues are coding related, rather than theoretical.
We moving along and are in no way threatened by time.
If you think there's
something useful I could do, let me know and I'll take a look.
I feel like I need a better way of unit testing new code. Some of the
code in the patch is to handle corner cases, so recreating them is
fairly hard. It is a nagging feeling that I am missing some knowledge
here and would welcome some insight, or research, into better ways of
doing general case unit testing.
--
Simon Riggs www.2ndQuadrant.com
On 09/26/2009 10:04 AM, Simon Riggs wrote:
If you think there's
something useful I could do, let me know and I'll take a look.I feel like I need a better way of unit testing new code. Some of the
code in the patch is to handle corner cases, so recreating them is
fairly hard. It is a nagging feeling that I am missing some knowledge
here and would welcome some insight, or research, into better ways of
doing general case unit testing.
You might try and steal ideas from EasyMock / PowerMock - but not sure
how well the ideas map to C.
Generally it means allowing the functions to be called from a "mock"
environment, where subroutine calls that might be called are stubbed out
to return sample data that would simulate your scenario. Object oriented
languages that require every object to provide an interface where most
object methods can be overridden are more ideal for performing this sort
of test.
I rarely ever see this sort of stuff in FOSS projects, and never that I
can remember in FOSS C projects. It's not easy, though.
I assume you are doing it through code changing right now. Commenting
out lines, replacing them with others, etc?
Cheers,
mark
--
Mark Mielke<mark@mielke.cc>
On Sat, Sep 26, 2009 at 10:45:17AM -0400, Mark Mielke wrote:
On 09/26/2009 10:04 AM, Simon Riggs wrote:
If you think there's
something useful I could do, let me know and I'll take a look.I feel like I need a better way of unit testing new code. Some of the
code in the patch is to handle corner cases, so recreating them is
fairly hard. It is a nagging feeling that I am missing some knowledge
here and would welcome some insight, or research, into better ways of
doing general case unit testing.You might try and steal ideas from EasyMock / PowerMock - but not
sure how well the ideas map to C.Generally it means allowing the functions to be called from a "mock"
environment, where subroutine calls that might be called are stubbed
out to return sample data that would simulate your scenario. Object
oriented languages that require every object to provide an interface
where most object methods can be overridden are more ideal for
performing this sort of test.I rarely ever see this sort of stuff in FOSS projects, and never
that I can remember in FOSS C projects. It's not easy, though.I assume you are doing it through code changing right now.
Commenting out lines, replacing them with others, etc?Cheers,
mark--
Mark Mielke<mark@mielke.cc>
There are a variety of projects dedicated to creating C unit test
frameworks. I don't have a lot of experience with them, but I have heard
good things about check and cunit. Here's a link I found with a longer
list of frameworks. http://www.opensourcetesting.org/unit_c.php
--
--Dan
I feel like I need a better way of unit testing new code. Some of the
code in the patch is to handle corner cases, so recreating them is
fairly hard. It is a nagging feeling that I am missing some knowledge
here and would welcome some insight, or research, into better ways of
doing general case unit testing.
There's always pgtap. Whenever we find a new corner case, we add it to
the development test suite.
--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com
On 09/26/2009 02:28 PM, Dan Colish wrote:
There are a variety of projects dedicated to creating C unit test
frameworks. I don't have a lot of experience with them, but I have heard
good things about check and cunit. Here's a link I found with a longer
list of frameworks. http://www.opensourcetesting.org/unit_c.php
Looking at check and cunit - I don't see what sort of mock function
facility they would provide? One part of unit testing is arranging for
functions to be called, tested, and results reported on. This can take
you a certain amount of the way. "Pure" functions, for example, that
always generate the same output for the same input parameters, are
perfect for this situation. Perhaps test how a qsort() or bsearch()
method works under various scenarios?
Most real life code gets a little more complicated. For example, what if
we want to simulate a network failure or "out of disk space" condition?
What if we want to test out what happens when the Y2038 date is reached?
This requires either complex test case setup that is difficult to run
reproducibly, or another approach - "mock". It means doing things like
overriding the write() method, and making it return successful N times,
and then failing on the (N+1)th time with ENOSPC. It means overriding
the gettimeofday() method to return a time in the future. A major
benefit of this sort of testing is that it should not require source
changes in order to perform the test. This sort of stuff is a LOT easier
to do in OO languages. I see it done in Java a lot. I can't remember
ever having seen it done in C. I think it's just too hard compared to
the value obtained from the effort.
In your list above, it does show a few attempts - CMock sticks out as a
for example. It looks more complicated, though. It takes a .h file and
generates stubs for you to fill in? That could be difficult to manage
for a large project with thousands or many times more unit tests. OO is
easier because you can override *only* particular methods, and you can
safely call the super method that it overrides to provide the underlying
behaviour in the success cases.
Cheers,
mark
--
Mark Mielke<mark@mielke.cc>
On Sep 26, 2009, at 12:33 PM, Josh Berkus wrote:
There's always pgtap. Whenever we find a new corner case, we add it
to
the development test suite.
Also, for C TAP, there's [libtap](http://jc.ngo.org.uk/trac-bin/trac.cgi/wiki/LibTap
). You can then use `prove` which you likely already have on your
system, to harness the test output.
Best,
David
Mark Mielke escribi�:
Most real life code gets a little more complicated. For example,
what if we want to simulate a network failure or "out of disk space"
condition? What if we want to test out what happens when the Y2038
date is reached? This requires either complex test case setup that
is difficult to run reproducibly, or another approach - "mock". It
means doing things like overriding the write() method, and making it
return successful N times, and then failing on the (N+1)th time with
ENOSPC.
I remember a kernel simulator based on User-Mode Linux called UMLsim,
with which you could stuff like this. It's dead at this point though,
and I don't know if there's any replacement.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Per Simon's request, for the benefit of the archive, here's all the
changes I've done on the patch since he posted the initial version for
review for this commitfest as incremental patches. This is extracted
from my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
hs-riggs-branch-20090927.tar.gzapplication/x-gzip; name=hs-riggs-branch-20090927.tar.gzDownload
Heikki Linnakangas wrote:
Per Simon's request, for the benefit of the archive, here's all the
changes I've done on the patch since he posted the initial version for
review for this commitfest as incremental patches. This is extracted
from my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git.
Further fixes extracted from above repository attached..
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
Looking at the changes to StartupMultiXact, you're changing the locking
so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
are acquire first before changing anything. Why? Looking at the other
functions in that file, all others that access both files are happy to
acquire one lock at a time, like StartupMultiXact does without the patch.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Looking at the changes to StartupMultiXact, you're changing the locking
so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
are acquired first before changing anything. Why? Looking at the other
functions in that file, all others that access both files are happy to
acquire one lock at a time, like StartupMultiXact does without the patch.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Regarding this in InitStandbyDelayTimers:
+ /*
+ * If replication delay is enormously huge, just treat that as
+ * zero and work up from there. This prevents us from acting
+ * foolishly when replaying old log files.
+ */
+ if (*currentDelay_ms < 0)
+ *currentDelay_ms = 0;
+
So we're treating restoring from an old backup the same as an up-to-date
standby server. If you're restoring from say a month old base backup
with WAL archive up to present day, and have max_standby_delay set to
say 5 seconds, the server will wait for that 5 seconds on each
conflicting query before killing it. Until it reaches the point in the
archive where the delay is less than INT_MAX/1000 seconds old: at that
point it switches into "oh my goodness, we've fallen badly behind, let's
try to catch up ASAP and kill any queries that get into the way" mode.
That's pretty surprising behavior, and not documented either. I propose
we simply remove the above check (fixing the rest of the code so that
you don't hit integer overflows), and always respect max_standby_delay.
BTW, I wonder if should warn or something if we find that the timestamps
in the archive are in the future? IOW, if either the master's or the
standby's clock is not set correctly.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
+ /*
+ * If our initial RunningXactData had an overflowed snapshot then we
+ * knew we were missing some subxids from our snapshot. We can use
+ * this data as an initial snapshot, but we cannot yet mark it valid.
+ * We know that the missing subxids are equal to or earlier than
+ * LatestRunningXid. After we initialise we continue to apply changes
+ * during recovery, so once the oldestRunningXid is later than the
+ * initLatestRunningXid we can now prove that we no longer have
+ * missing information and can mark the snapshot as valid.
+ */
+ if (initRunningXactData && !recoverySnapshotValid)
+ {
+ if (TransactionIdPrecedes(initLatestRunningXid,
xlrec->oldestRunningXid)
+ {
+ recoverySnapshotValid = true;
+ elog(trace_recovery(DEBUG2),
+ "running xact data now proven complete");
+ elog(trace_recovery(DEBUG2),
+ "recovery snapshots are now enabled");
+ }
+ return;
+ }
+
When GetRunningXactData() calculates latestRunningXid in the master,
which is stored in initLatestRunningXid in the standby, it only looks at
xids and subxids present in the procarray. It doesn't take into account
overflowed subxids. I think we could declare a recovery snapshot "proven
complete" too early because of that.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-09-30 at 18:45 +0300, Heikki Linnakangas wrote:
Regarding this in InitStandbyDelayTimers: + /* + * If replication delay is enormously huge, just treat that as + * zero and work up from there. This prevents us from acting + * foolishly when replaying old log files. + */ + if (*currentDelay_ms < 0) + *currentDelay_ms = 0; +So we're treating restoring from an old backup the same as an up-to-date
standby server. If you're restoring from say a month old base backup
with WAL archive up to present day, and have max_standby_delay set to
say 5 seconds, the server will wait for that 5 seconds on each
conflicting query before killing it. Until it reaches the point in the
archive where the delay is less than INT_MAX/1000 seconds old: at that
point it switches into "oh my goodness, we've fallen badly behind, let's
try to catch up ASAP and kill any queries that get into the way" mode.
That's pretty surprising behavior, and not documented either. I propose
we simply remove the above check (fixing the rest of the code so that
you don't hit integer overflows), and always respect max_standby_delay.
Agreed.
I will docuemnt the recommendation to set max_standby_delay = 0 if
performing an archive recovery (and explain why).
BTW, I wonder if should warn or something if we find that the timestamps
in the archive are in the future? IOW, if either the master's or the
standby's clock is not set correctly.
Something similar was just spotted by a client. You can set a
recovery_target_timestamp that is before the pg_stop_recovery()
timestamp and it doesn't complain. Will fix.
Not sure if I like the sound of a system moaning at me about the clock
settings. Perhaps just once when it starts, when we read control file.
--
Simon Riggs www.2ndQuadrant.com
On Thu, 2009-10-01 at 14:29 +0300, Heikki Linnakangas wrote:
+ /* + * If our initial RunningXactData had an overflowed snapshot then we + * knew we were missing some subxids from our snapshot. We can use + * this data as an initial snapshot, but we cannot yet mark it valid. + * We know that the missing subxids are equal to or earlier than + * LatestRunningXid. After we initialise we continue to apply changes + * during recovery, so once the oldestRunningXid is later than the + * initLatestRunningXid we can now prove that we no longer have + * missing information and can mark the snapshot as valid. + */ + if (initRunningXactData && !recoverySnapshotValid) + { + if (TransactionIdPrecedes(initLatestRunningXid, xlrec->oldestRunningXid) + { + recoverySnapshotValid = true; + elog(trace_recovery(DEBUG2), + "running xact data now proven complete"); + elog(trace_recovery(DEBUG2), + "recovery snapshots are now enabled"); + } + return; + } +When GetRunningXactData() calculates latestRunningXid in the master,
which is stored in initLatestRunningXid in the standby, it only looks at
xids and subxids present in the procarray. It doesn't take into account
overflowed subxids. I think we could declare a recovery snapshot "proven
complete" too early because of that.
Hmm, yes. ISTM that I'm still calculating latestRunningXid the old way
while assuming it is calculated the new way. The new way is just to grab
nextXid since we have XidGenLock and do TransactionIdRetreat() on it.
--
Simon Riggs www.2ndQuadrant.com
On Wed, 2009-09-30 at 09:33 +0300, Heikki Linnakangas wrote:
Looking at the changes to StartupMultiXact, you're changing the locking
so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
are acquire first before changing anything. Why? Looking at the other
functions in that file, all others that access both files are happy to
acquire one lock at a time, like StartupMultiXact does without the patch.
I think those changes are just paranoia from early versions of patch.
We now know for certain that MultiXact isn't used at the time
StartupMultiXact() is called. The same isn't true for StartupClog() and
StartupSubtrans() which need to cope with concurrent callers.
Will remove changes and document that nothing touching it when it runs.
--
Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote:
On Wed, 2009-09-30 at 09:33 +0300, Heikki Linnakangas wrote:
Looking at the changes to StartupMultiXact, you're changing the locking
so that both MultiXactOffsetControlLock and MultiXactMemberControlLock
are acquire first before changing anything. Why? Looking at the other
functions in that file, all others that access both files are happy to
acquire one lock at a time, like StartupMultiXact does without the patch.I think those changes are just paranoia from early versions of patch.
We now know for certain that MultiXact isn't used at the time
StartupMultiXact() is called. The same isn't true for StartupClog() and
StartupSubtrans() which need to cope with concurrent callers.Will remove changes and document that nothing touching it when it runs.
Thanks, I reverted that in my working version already. Comment patch
welcome if you feel it's needed.
Attached is a new batch of changes I've been doing since last batch.
These are again extracted from my git repository. It includes some of
the add-on patches from your repository, the rest I believe have I had
already did myself earlier, or are not necessary anymore for other reasons.
Could you look into these two TODO items you listed on the wiki page:
- Correct SET default_transaction_read_only and SET
transaction_read_only (Heikki 21/9 Hackers)
- Shutdown checkpoints must not clear locks for prepared transactions
(Heikki 23/9)
And if you could please review the changes I've been doing, just to make
sure I haven't inadvertently introduced new bugs. That has happened
before, as you've rightfully reminded me :-).
There's also the issue that you can't go into hot standby mode after a
shutdown checkpoint. I think that really should be fixed, it's just
weird from a usability point of view if it doesn't work.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
Simon Riggs wrote:
Hmm, yes. ISTM that I'm still calculating latestRunningXid the old way
while assuming it is calculated the new way. The new way is just to grab
nextXid since we have XidGenLock and do TransactionIdRetreat() on it.
Ok, good, that's what I thought too. I'll fix that.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com