pgsql: Use a latch to make startup process wake up and replay
Log Message:
-----------
Use a latch to make startup process wake up and replay immediately when
new WAL arrives via streaming replication. This reduces the latency, and
also allows us to use a longer polling interval, which is good for energy
efficiency.
We still need to poll to check for the appearance of a trigger file, but
the interval is now 5 seconds (instead of 100ms), like when waiting for
a new WAL segment to appear in WAL archive.
Modified Files:
--------------
pgsql/src/backend/access/transam:
xlog.c (r1.434 -> r1.435)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.434&r2=1.435)
pgsql/src/backend/replication:
walreceiver.c (r1.16 -> r1.17)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/replication/walreceiver.c?r1=1.16&r2=1.17)
pgsql/src/include/access:
xlog.h (r1.116 -> r1.117)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/access/xlog.h?r1=1.116&r2=1.117)
On Wed, Sep 15, 2010 at 7:35 PM, Heikki Linnakangas
<heikki@postgresql.org> wrote:
Log Message:
-----------
Use a latch to make startup process wake up and replay immediately when
new WAL arrives via streaming replication. This reduces the latency, and
also allows us to use a longer polling interval, which is good for energy
efficiency.We still need to poll to check for the appearance of a trigger file, but
the interval is now 5 seconds (instead of 100ms), like when waiting for
a new WAL segment to appear in WAL archive.
Good work!
+ /*
+ * Take ownership of the wakup latch if we're going to sleep during
+ * recovery.
+ */
I found a typo: s/wakup/wakeup/
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On 15 September 2010 11:35, Heikki Linnakangas <heikki@postgresql.org> wrote:
Log Message:
-----------
Use a latch to make startup process wake up and replay immediately when
new WAL arrives via streaming replication. This reduces the latency, and
also allows us to use a longer polling interval, which is good for energy
efficiency.We still need to poll to check for the appearance of a trigger file, but
the interval is now 5 seconds (instead of 100ms), like when waiting for
a new WAL segment to appear in WAL archive.Modified Files:
--------------
pgsql/src/backend/access/transam:
xlog.c (r1.434 -> r1.435)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.434&r2=1.435)
pgsql/src/backend/replication:
walreceiver.c (r1.16 -> r1.17)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/replication/walreceiver.c?r1=1.16&r2=1.17)
pgsql/src/include/access:
xlog.h (r1.116 -> r1.117)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/access/xlog.h?r1=1.116&r2=1.117)--
+ * We don't need the latch anymore. It's not strictly necessary to disown
+ * it, but let's do it for the sake of tidyness.
+ */
s/tidyness/tidiness/
--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
heikki@postgresql.org (Heikki Linnakangas) writes:
Log Message:
-----------
Use a latch to make startup process wake up and replay immediately when
new WAL arrives via streaming replication. This reduces the latency, and
also allows us to use a longer polling interval, which is good for energy
efficiency.
We still need to poll to check for the appearance of a trigger file, but
the interval is now 5 seconds (instead of 100ms), like when waiting for
a new WAL segment to appear in WAL archive.
This is just speculation at this point, because I haven't taken time
to think through the details, but couldn't we improve on that still
further?
There are always going to be some conditions that we have to poll for,
in particular death of the postmaster (since Unix unaccountably fails
to provide a SIGPARNT signal condition :-(). However postmaster death
isn't really something that needs an instant response IMO. I would like
to get the wakeup-and-poll interval for our background processes down to
a minute or so; so far as postmaster death goes that doesn't seem like
an unacceptable response time.
So I'm wondering if we couldn't eliminate the five-second sleep
requirement here too. It's problematic anyhow, since somebody looking
for energy efficiency will still feel it's too short, while somebody
concerned about fast failover will feel it's too long. Could the
standby triggering protocol be modified so that it involves sending a
signal, not just creating a file? (One issue is that it's not clear
what that'd translate to on Windows.)
regards, tom lane
On 15/09/10 14:14, Fujii Masao wrote:
+ /* + * Take ownership of the wakup latch if we're going to sleep during + * recovery. + */I found a typo: s/wakup/wakeup/
On 15/09/10 14:48, Thom Brown wrote:
+ * We don't need the latch anymore. It's not strictly necessary to disown + * it, but let's do it for the sake of tidyness. + */s/tidyness/tidiness/
Fixed both of these. Thank you.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On 15/09/10 16:55, Tom Lane wrote:
So I'm wondering if we couldn't eliminate the five-second sleep
requirement here too. It's problematic anyhow, since somebody looking
for energy efficiency will still feel it's too short, while somebody
concerned about fast failover will feel it's too long.
Yep.
Could the
standby triggering protocol be modified so that it involves sending a
signal, not just creating a file?
Seems reasonable, at least if we still provide an option for more
frequent polling and no need to send signal.
(One issue is that it's not clear what that'd translate to on Windows.)
pg_ctl failover ? At the moment, the location of the trigger file is
configurable, but if we accept a constant location like
"$PGDATA/failover" pg_ctl could do the whole thing, create the file and
send signal. pg_ctl on Window already knows how to send the "signal" via
the named pipe signal emulation.
Fujii-san suggested that we might have a user-defined function for
triggering failover as well. That's also handy, but it's not a
replacement because it only works in hot standby mode.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Wed, 2010-09-15 at 20:14 +0900, Fujii Masao wrote:
On Wed, Sep 15, 2010 at 7:35 PM, Heikki Linnakangas
<heikki@postgresql.org> wrote:Log Message:
-----------
Use a latch to make startup process wake up and replay immediately when
new WAL arrives via streaming replication. This reduces the latency, and
also allows us to use a longer polling interval, which is good for energy
efficiency.We still need to poll to check for the appearance of a trigger file, but
the interval is now 5 seconds (instead of 100ms), like when waiting for
a new WAL segment to appear in WAL archive.Good work!
No, not good work.
You both know very well that I'm working on this area also and these
commits are not agreed... yet. They might not be contended but they are
very likely to break my patch, again.
Please desist while we resolve which are the good ideas and which are
not. We won't know that if you keep breaking other people's patches in a
stream of commits that prevent anybody completing other options.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Sep 15, 2010 at 03:35:30PM +0100, Simon Riggs wrote:
On Wed, 2010-09-15 at 20:14 +0900, Fujii Masao wrote:
On Wed, Sep 15, 2010 at 7:35 PM, Heikki Linnakangas
<heikki@postgresql.org> wrote:Log Message:
-----------
Use a latch to make startup process wake up and replay immediately when
new WAL arrives via streaming replication. This reduces the latency, and
also allows us to use a longer polling interval, which is good for energy
efficiency.We still need to poll to check for the appearance of a trigger file, but
the interval is now 5 seconds (instead of 100ms), like when waiting for
a new WAL segment to appear in WAL archive.Good work!
No, not good work.
You both know very well that I'm working on this area also and these
commits are not agreed... yet. They might not be contended but they are
very likely to break my patch, again.Please desist while we resolve which are the good ideas and which are
not. We won't know that if you keep breaking other people's patches in a
stream of commits that prevent anybody completing other options.
Simon,
No matter how many times you try, you are not going to get a license
to stop all work on anything you might chance to think about. It is
quite simply never going to happen, so you need to back off.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, 2010-09-15 at 07:59 -0700, David Fetter wrote:
On Wed, Sep 15, 2010 at 03:35:30PM +0100, Simon Riggs wrote:
Please desist while we resolve which are the good ideas and which are
not. We won't know that if you keep breaking other people's patches in a
stream of commits that prevent anybody completing other options.
No matter how many times you try, you are not going to get a license
to stop all work on anything you might chance to think about. It is
quite simply never going to happen, so you need to back off.
I agree that asking people to stop work is not OK. However, I haven't
asked for development work to stop, only that commits into that area
stop until proper debate has taken place. Those might be minor commits,
but they might not. Had I made those commits, they would have been
called premature by others also.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> writes:
On Wed, 2010-09-15 at 20:14 +0900, Fujii Masao wrote:
Good work!
No, not good work.
You both know very well that I'm working on this area also and these
commits are not agreed... yet. They might not be contended but they are
very likely to break my patch, again.
Simon, you can't just command the tide to stop coming in while you do
something that no one else can see. This patch is perfectly reasonable,
not at all invasive, and a clear improvement over the way the code was
before. If you can improve on it further, fine, but we're not going to
declare the code off-limits while waiting for an unspecified patch with
no firm delivery date.
regards, tom lane
On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
I agree that asking people to stop work is not OK. However, I haven't
asked for development work to stop, only that commits into that area
stop until proper debate has taken place. Those might be minor commits,
but they might not. Had I made those commits, they would have been
called premature by others also.
I do not believe that Heikki has done anything inappropriate. We've
spent weeks discussing the latch facility and its various
applications.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, 2010-09-15 at 11:25 -0400, Tom Lane wrote:
... an unspecified patch with no firm delivery date.
I'm happy to post my current work, if it's considered helpful. The sole
intent of that work is to help the community understand the benefits of
the proposals I have made, so perhaps this patch does serve that
purpose.
The attached patch compiles, but I wouldn't bother trying to run it yet.
I'm still wading through the latch rewrite. It probably doesn't apply
cleanly to head anymore either, hence discussion.
I wouldn't normally waste people's time by posting a non-working patch,
the majority of the code is in about the right place of execution. There
aren't any unclear aspects in the design, so its worth looking at.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services
Attachments:
sync_rep.v3.patchtext/x-patch; charset=UTF-8; name=sync_rep.v3.patchDownload+1268-74
On Wed, 2010-09-15 at 12:45 -0400, Robert Haas wrote:
On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
I agree that asking people to stop work is not OK. However, I haven't
asked for development work to stop, only that commits into that area
stop until proper debate has taken place. Those might be minor commits,
but they might not. Had I made those commits, they would have been
called premature by others also.I do not believe that Heikki has done anything inappropriate. We've
spent weeks discussing the latch facility and its various
applications.
Sounds reasonable, but my comments were about this commit, not the one
that happened on Saturday. This patch was posted about 32 hours ago, and
the commit need not have taken place yet. If I had posted such a patch
and committed it knowing other work is happening in that area we both
know that you would have objected.
It's not actually a major issue, but at some point I have to ask for no
more commits, so Fujii and I can finish our patches, compare and
contrast, so the best ideas can get into Postgres.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Sep 15, 2010 at 1:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Wed, 2010-09-15 at 12:45 -0400, Robert Haas wrote:
On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
I agree that asking people to stop work is not OK. However, I haven't
asked for development work to stop, only that commits into that area
stop until proper debate has taken place. Those might be minor commits,
but they might not. Had I made those commits, they would have been
called premature by others also.I do not believe that Heikki has done anything inappropriate. We've
spent weeks discussing the latch facility and its various
applications.Sounds reasonable, but my comments were about this commit, not the one
that happened on Saturday. This patch was posted about 32 hours ago, and
the commit need not have taken place yet. If I had posted such a patch
and committed it knowing other work is happening in that area we both
know that you would have objected.
I've often felt that we ought to have a bit more delay between when
committers post patches and when they commit them. I was told 24
hours and I've seen cases where people haven't even waited that long.
On the other hand, if we get to strict about it, it can easily get to
the point where it just gets in the way of progress, and certainly
some patches are far more controversial than others. So I don't know
what the best thing to do is. Still, I have to admit that I feel
fairly positive about the direction we're going with this particular
patch. Clearing away these peripheral issues should make it easier
for us to have a rational discussion about the core issues around how
this is going to be configured and actually work at the protocol
level.
It's not actually a major issue, but at some point I have to ask for no
more commits, so Fujii and I can finish our patches, compare and
contrast, so the best ideas can get into Postgres.
I don't think anyone is prepared to agree to that. I think that
everyone is prepared to accept a limited amount of further delay in
pressing forward with the main part of sync rep, but I expect that no
one will be willing to freeze out incremental improvements in the
meantime, even if it does induce a certain amount of rebasing. It's
also worth noting that Fujii Masao's patch has been around for months,
and yours isn't finished yet. That's not to say that we don't want to
consider your ideas, because we do: and you've had more than your
share of good ones. At the same time, it would be unfair and
unreasonable to expect work on a patch that is done, and has been done
for some time, to wait on one that isn't.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On 15/09/10 20:58, Robert Haas wrote:
On Wed, Sep 15, 2010 at 1:30 PM, Simon Riggs<simon@2ndquadrant.com> wrote:
On Wed, 2010-09-15 at 12:45 -0400, Robert Haas wrote:
On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs<simon@2ndquadrant.com> wrote:
I agree that asking people to stop work is not OK. However, I haven't
asked for development work to stop, only that commits into that area
stop until proper debate has taken place. Those might be minor commits,
but they might not. Had I made those commits, they would have been
called premature by others also.I do not believe that Heikki has done anything inappropriate. We've
spent weeks discussing the latch facility and its various
applications.Sounds reasonable, but my comments were about this commit, not the one
that happened on Saturday. This patch was posted about 32 hours ago, and
the commit need not have taken place yet. If I had posted such a patch
and committed it knowing other work is happening in that area we both
know that you would have objected.I've often felt that we ought to have a bit more delay between when
committers post patches and when they commit them. I was told 24
hours and I've seen cases where people haven't even waited that long.
On the other hand, if we get to strict about it, it can easily get to
the point where it just gets in the way of progress, and certainly
some patches are far more controversial than others. So I don't know
what the best thing to do is.
With anything non-trivial, I try to "sleep on it" before committing.
More with complicated patches, but it's really up to your own comfort
level with the patch, and whether you think anyone might have different
opinions on it. I don't mind quick commits if it's something that has
been discussed in the past and the committer thinks it's
non-controversial. There's always the option of complaining afterwards.
If it comes to that, though, it wasn't really ripe for committing yet.
(That doesn't apply to gripes about typos or something like that,
because that happens to me way too often ;-) )
Still, I have to admit that I feel
fairly positive about the direction we're going with this particular
patch. Clearing away these peripheral issues should make it easier
for us to have a rational discussion about the core issues around how
this is going to be configured and actually work at the protocol
level.
Yeah, I don't think anyone has any qualms about the substance of these
patches.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Wed, 2010-09-15 at 13:58 -0400, Robert Haas wrote:
It's not actually a major issue, but at some point I have to ask for
no
more commits, so Fujii and I can finish our patches, compare and
contrast, so the best ideas can get into Postgres.I don't think anyone is prepared to agree to that. I think that
everyone is prepared to accept a limited amount of further delay in
pressing forward with the main part of sync rep, but I expect that no
one will be willing to freeze out incremental improvements in the
meantime, even if it does induce a certain amount of rebasing.
It's
also worth noting that Fujii Masao's patch has been around for months,
and yours isn't finished yet. That's not to say that we don't want to
consider your ideas, because we do: and you've had more than your
share of good ones. At the same time, it would be unfair and
unreasonable to expect work on a patch that is done, and has been done
for some time, to wait on one that isn't.
I understand your viewpoint there. I'm sure we all agree sync rep is a
very important feature that must get into the next release.
The only reason my patch exists is because debate around my ideas was
ruled out on various grounds. One of those was it would take so long to
develop we shouldn't risk not getting sync rep in this release. I am
amenable to such arguments (and I make the same one on MERGE, btw, where
I am getting seriously worried) but the reality is that there is
actually very little code here and we can definitely do this, whatever
ideas we pick. I've shown this by providing an almost working version in
about 4 days work. Will finishing it help?
We definitely have the time, so the question is, what are the best
ideas? We must discuss the ideas properly, not just plough forwards
claiming time pressure when it isn't actually an issue at all. We *need*
to put the tools down and talk in detail about the best way forwards.
Before, I had no patch. Now mine "isn't finished". At what point will my
ideas be reviewed without instant dismissal? If we accept your seniority
argument, then "never" because even if I finish it you'll say "Fujii was
there first".
If who mentioned it first was important, then I'd say I've been
discussing this for literally years (late 2006) and have regularly
explained the benefits of the master-side approach I've outlined on list
every time this has come up (every few months). I have also explained
the implementation details many times as well an I'm happy to say that
latches are pretty much exactly what I described earlier. (I called them
LSN queues, similar to lwlocks, IIRC). But thats not the whole deal.
If we simply wanted a patch that was "done" we would have gone with
Zoltan's wouldn't we, based on the seniority argument you use above?
Zoltan's patch didn't perform well at all. Fujii's performs much better.
However, my proposed approach offers even better performance, so
whatever argument you use to include Fujii's also applies to mine
doesn't it? But that's silly and divisive, its not about who's patch
"wins" is it?
Do we have to benchmark multiple patches to prove which is best? If
that's the criteria I'll finish my patch and demonstrate that.
But it doesn't make sense to start committing pieces of Fujii's patch,
so that I can't ever keep up and as a result "Simon never finished his
patch, but it sounded good".
Next steps should be: tools down, discuss what to do. Then go forwards.
We have time, so lets discuss all of the ideas on the table not just
some of them.
For me this is not about the number or names of parameters, its about
master-side control of sync rep and having very good performance.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Sep 15, 2010 at 3:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Will finishing it help?
Yes, I expect that to help a lot.
Before, I had no patch. Now mine "isn't finished". At what point will my
ideas be reviewed without instant dismissal? If we accept your seniority
argument, then "never" because even if I finish it you'll say "Fujii was
there first".
I said very clearly in my previous email that "I think that everyone
is prepared to accept a limited amount of further delay in pressing
forward with the main part of sync rep". In other words, I think
everyone is willing to consider your ideas provided that they are
submitted in a form which everyone can understand and think through
sometime soon. I am not, nor do I think anyone is, saying that we
don't wish to consider your ideas. I'm actually really pleased that
you are only a day or two from having a working patch. It can be much
easier to conceptualize a patch than to find the time to finish it
(unfortunately, this problem has overtaken me rather badly in the last
few weeks, which is why I have no new patches in this CommitFest) and
if you can finish it up and get it out in front of everyone I expect
that to be a good thing for this feature and our community.
Do we have to benchmark multiple patches to prove which is best? If
that's the criteria I'll finish my patch and demonstrate that.
I was thinking about that earlier today. I think it's definitely
possible that we'll need to do some benchmarking, although I expect
that people will want to read the code first.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Sep 15, 2010 at 11:14 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
(One issue is that it's not clear what that'd translate to on Windows.)
pg_ctl failover ? At the moment, the location of the trigger file is
configurable, but if we accept a constant location like "$PGDATA/failover"
pg_ctl could do the whole thing, create the file and send signal. pg_ctl on
Window already knows how to send the "signal" via the named pipe signal
emulation.
Right.
Fujii-san suggested that we might have a user-defined function for
triggering failover as well.
The attached patch introduces such a user-defined function. This is
useful especially when clusterware like pgpool-II is located in remote
server since it can trigger failover without using something like ssh.
That's also handy, but it's not a replacement
because it only works in hot standby mode.
Yep.
And we should increase the sleep time in walsender's poll loop (i.e.,
increase the default value of wal_sender_delay) too? Currently it's
very small, 200ms.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
pg_trigger_failover_v1.patchapplication/octet-stream; name=pg_trigger_failover_v1.patchDownload+106-29
On Wed, Sep 15, 2010 at 10:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So I'm wondering if we couldn't eliminate the five-second sleep
requirement here too.
That would make the shutdown time longer since startup process currently
cannot respond to SIGTERM and SIGHUP immediately. To avoid this, I think
that we should change the signal handlers of startup process so that they
call WakeupRecovery.
The attached patch makes StartupProcSigHupHandler and StartupProcShutdownHandler
call WakeupRecovery.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
signal_handler_wakeup_recovery_v1.patchapplication/octet-stream; name=signal_handler_wakeup_recovery_v1.patchDownload+2-0
On Thu, Sep 16, 2010 at 1:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
Fujii-san suggested that we might have a user-defined function for
triggering failover as well.The attached patch introduces such a user-defined function. This is
useful especially when clusterware like pgpool-II is located in remote
server since it can trigger failover without using something like ssh.
I forgot to check if the caller of that function has superuser permission.
Here is the updated version.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center