Additional options for Sync Replication
Proposed changes: Make synchronous_replication into an enum, so we
can now also say synchronous_replication = recv, flush or apply as
well as on or off.
synchronous_replication = on is the same as "flush"
Benefit: Allows 2 additional wait modes for sync rep: wait for
receive and wait for apply.
Rationale:
I was hoping to fine tune/tweak Sync Rep after feedback during beta,
but my understanding of current consensus is that that will be too
late to make user visible changes. So I'm proposing this change now,
before Beta, rather than during Beta.
Since we now have replies from standby arriving when we receive data,
not just fsync, we may as well do something useful with them in this
release.
So I reintroduce changes made in the original patch submission that
were split out for piece-by-piece commit. This was also part of the
originally agreed functionality for Sync Rep. So I don't expect any
procedural difficulties in accepting this patch. Heikki was right to
request removal of that code while we got things correct. Adding it
back now is much simpler.
The internal changes are minor, simply changing things so we have 3
queues rather than 1. The user backend path is identical in length,
the sync standby path very slightly longer. Replies from standby
continue to be sent every 100ms until all flushed WAL has been
applied, then it falls back to the wal_receiver_status_interval, again
a minor change.
The parameter is an enum since multiple people called Josh asked for a
very simple interface "on" or "off", while others requested multiple
favours. This gives us both.
I expect "recv" mode to be even more useful in next release, with
WALWriter active during recovery.
Patch feedback please.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
syncrep_queues.v1.cpatchapplication/octet-stream; name=syncrep_queues.v1.cpatchDownload+266-184
On Sun, Mar 27, 2011 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I was hoping to fine tune/tweak Sync Rep after feedback during beta,
but my understanding of current consensus is that that will be too
late to make user visible changes. So I'm proposing this change now,
before Beta, rather than during Beta.
This is completely inappropriate. The deadline for new feature
patches has long since passed. It was January 15th. The discussion
you are referring to had to do with fixing the behavior of features we
already have, not adding new ones.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sun, Mar 27, 2011 at 10:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I was hoping to fine tune/tweak Sync Rep after feedback during beta,
but my understanding of current consensus is that that will be too
late to make user visible changes. So I'm proposing this change now,
before Beta, rather than during Beta.
For what it's worth I think this is a simplification. We have:
1) Development when new features are added
2) Feature freeze - when those features are tweaked and fixed based on
our own testing but no new features added
3) Beta - when features are tweaked and fixed in response to user
suggestions but no new features added
4) Release - when only bugs are fixed
So the question becomes, what is a new feature versus a behaviour
change to an existing feature or a bug-fix. These are subjective
questions with a lot of room for ambiguity. Is this a new feature? Is
this existing code broken or unusable in some way that we don't want
to release and have to support?
We're not in a vacuum here and we all see Tom reworking major portions
of the collation patch while you're being told you can't squeak this
relatively small feature in. But the collation behaviour is something
we'll have to deal with and support for years and will have trouble
changing in subsequent versions without compatibility issues. This
behaviour is clearly something that can be added onto in subsequent
versions and the existing feature set is usable as it is.
Also, for what it's worth I prefer thinking of
synchronous_commit/synchronous_replication as one big multi-way
variable:
synchronous_commit = memory | disk | replica-memory | replica-disk |
replica-visible
--
greg
On Sun, Mar 27, 2011 at 11:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Mar 27, 2011 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I was hoping to fine tune/tweak Sync Rep after feedback during beta,
but my understanding of current consensus is that that will be too
late to make user visible changes. So I'm proposing this change now,
before Beta, rather than during Beta.This is completely inappropriate. The deadline for new feature
patches has long since passed. It was January 15th. The discussion
you are referring to had to do with fixing the behavior of features we
already have, not adding new ones.
You skipped the bit that said this code was part of the original patch
submission, so this code met your deadline. That was stated clearly in
the next paragraph of my email.
It's also a minor change and one that others had agreed we want.
You have no basis on which to prevent this.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Mar 27, 2011 at 11:55 PM, Greg Stark <gsstark@mit.edu> wrote:
Also, for what it's worth I prefer thinking of
synchronous_commit/synchronous_replication as one big multi-way
variable:synchronous_commit = memory | disk | replica-memory | replica-disk |
replica-visible
That's close enough to my thinking for me to say yes to.
I'd prefer to call it "commit_durability" though.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 28, 2011 at 10:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
You have no basis on which to prevent this.
It's also already on the Open Items list, put there by you.
Why is this even a discussion point?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 28.03.2011 13:14, Simon Riggs wrote:
On Mon, Mar 28, 2011 at 10:52 AM, Simon Riggs<simon@2ndquadrant.com> wrote:
You have no basis on which to prevent this.
It's also already on the Open Items list, put there by you.
Why is this even a discussion point?
FWIW, I agree this is an additional feature that we shouldn't be messing
with at this point in the release cycle.
The 'apply' mode would be quite interesting, it would make it easier to
build load-balancing clusters. But the patch isn't up to the task on
that yet - the 'apply' status report is only sent after
wal_receiver_status_interval fills up, so you get long delays.
PS. you're missing some break's in the switch-statement in
SyncRepWaitForLSN(), effectively making the patch do nothing.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Mar 28, 2011 at 6:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Mon, Mar 28, 2011 at 10:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
You have no basis on which to prevent this.
It's also already on the Open Items list, put there by you.
Huh? There is an open item about whether we should merge
synchronous_commit and synchronous_replication into a single GUC,
which might be a better interface since it would typically give the
user one less thing to have to fiddle with, but there is certainly no
open item for adding additional sync rep modes. Merging those two
GUCs is a reasonable thing to consider even at this late date, because
once we cut beta - and certainly once we cut final - we'll be stuck
supporting whatever interface we release for 5+ years. There is no
similar risk for this patch - the only risk of not committing this
feature now is that we won't have this feature in 9.1. But if we
accept that as valid justification for committing it, then everything
is fair game, and that is not going to lead to a timely release.
Why is this even a discussion point?
Adding more new code is likely to add more new bugs, and we're trying
not to do that right now, so we can make a release.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Mar 28, 2011 at 12:05 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
On 28.03.2011 13:14, Simon Riggs wrote:
On Mon, Mar 28, 2011 at 10:52 AM, Simon Riggs<simon@2ndquadrant.com>
wrote:You have no basis on which to prevent this.
It's also already on the Open Items list, put there by you.
Why is this even a discussion point?
FWIW, I agree this is an additional feature that we shouldn't be messing
with at this point in the release cycle.
There is an open item about what the UI is for sync commit/sync rep,
which is the subject of this patch.
The 'apply' mode would be quite interesting, it would make it easier to
build load-balancing clusters. But the patch isn't up to the task on that
yet - the 'apply' status report is only sent after
wal_receiver_status_interval fills up, so you get long delays.
Yes it's up to the task, you misread it. It will continue sending
replies while apply <> flush and then it will fall back to the
behaviour you mention.
PS. you're missing some break's in the switch-statement in
SyncRepWaitForLSN(), effectively making the patch do nothing.
OK, thanks.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 28, 2011 at 10:59 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Sun, Mar 27, 2011 at 11:55 PM, Greg Stark <gsstark@mit.edu> wrote:
Also, for what it's worth I prefer thinking of
synchronous_commit/synchronous_replication as one big multi-way
variable:synchronous_commit = memory | disk | replica-memory | replica-disk |
replica-visibleThat's close enough to my thinking for me to say yes to.
Patch to implement this new UI attached, exactly as you suggest above.
Words can be changed easily without changing the music.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
syncrep_ui.v2.cpatchapplication/octet-stream; name=syncrep_ui.v2.cpatchDownload+277-247
On 28.03.2011 15:34, Simon Riggs wrote:
On Mon, Mar 28, 2011 at 12:05 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:On 28.03.2011 13:14, Simon Riggs wrote:
On Mon, Mar 28, 2011 at 10:52 AM, Simon Riggs<simon@2ndquadrant.com>
wrote:You have no basis on which to prevent this.
It's also already on the Open Items list, put there by you.
Why is this even a discussion point?
FWIW, I agree this is an additional feature that we shouldn't be messing
with at this point in the release cycle.There is an open item about what the UI is for sync commit/sync rep,
which is the subject of this patch.
plus new functionality. For the UI part, you just need to change GUCs.
The 'apply' mode would be quite interesting, it would make it easier to
build load-balancing clusters. But the patch isn't up to the task on that
yet - the 'apply' status report is only sent after
wal_receiver_status_interval fills up, so you get long delays.Yes it's up to the task, you misread it. It will continue sending
replies while apply<> flush and then it will fall back to the
behaviour you mention.
There's nothing to wake up the walreceiver after applying a commit record.
Oh, you're relying on the periodic wakeups specified by
NAPTIME_PER_CYCLE (100ms). That's still on average a 50ms delay to every
commit. We should try to eliminate these polling loops, not make them
more important. In fact, we should raise NAPTIME_PER_CYCLE to, say,
1000ms, to reduce spurious wakeups on an idle system.
Am I reading the patch correctly that if the standby hasn't applied all
WAL yet, you send a reply message at every wakeup, whether or not any
progress has been made since last time? So if you have a
long-running-transaction in the standby, for example, conflicting with
WAL recovery, the standby will keep sending a status report to the
master every 100ms.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Mar 28, 2011 at 1:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
but there is certainly no
open item for adding additional sync rep modes.
In your opinion.
We will have to live with the UI for a long time, yes. I'm trying to
get it right, whether that includes adding an obvious/easy omission or
renaming things to make better sense.
Your other changes make this sensible, and feedback I received after a
recent presentation tells me that people will expect it to work with
the two additional options.
I would prefer further feedback before solidifying this design, but if
we must solidify it now, then I prefer to do that with all 5 options.
As originally submitted for this commit fest.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 28, 2011 at 8:54 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Mon, Mar 28, 2011 at 1:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
but there is certainly no
open item for adding additional sync rep modes.In your opinion.
Well, as you just pointed out yourself a few minutes ago, I did write
the open item in question... I fancy I knew what I meant.
I would prefer further feedback before solidifying this design, but if
we must solidify it now, then I prefer to do that with all 5 options.
As originally submitted for this commit fest.
Even if it were true that these options were in the patch submitted
for this CommitFest, that wouldn't be a reason to commit them now,
because the CommitFest is over and has been for several weeks. But it
turns out they weren't.
http://archives.postgresql.org/message-id/1295127631.3282.100.camel@ebony
+/*
+ * Queuing code is written to allow later extension to multiple
+ * queues. Currently, we use just one queue (==FSYNC).
+ *
+ * XXX We later expect to have RECV, FSYNC and APPLY modes.
+ */
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 28.03.2011 15:54, Simon Riggs wrote:
On Mon, Mar 28, 2011 at 1:14 PM, Robert Haas<robertmhaas@gmail.com> wrote:
but there is certainly no
open item for adding additional sync rep modes.In your opinion.
Huh? First you say that Robert added an open item about this to the
list, he says that the open item wasn't about additional sync rep modes,
and you say "in your opinion".
We will have to live with the UI for a long time, yes. I'm trying to
get it right, whether that includes adding an obvious/easy omission or
renaming things to make better sense.Your other changes make this sensible, and feedback I received after a
recent presentation tells me that people will expect it to work with
the two additional options.I would prefer further feedback before solidifying this design, but if
we must solidify it now, then I prefer to do that with all 5 options.
As originally submitted for this commit fest.
It's too late to be doing this. The patch isn't ready to be committed,
and there's high potential to introduce new bugs or usability issues.
And regarding the UI, I'm not totally convinced that a four-state GUC
set in the master is the way go. It would feel at least as logical to
control this in the standby.
I don't really want to get into that discussion, though. My point is
that if we wanted to still sneak this in, then we would have to have
those discussions. -1. Let's fix the existing issues, and release.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Mar 28, 2011 at 1:51 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
The 'apply' mode would be quite interesting, it would make it easier to
build load-balancing clusters. But the patch isn't up to the task on that
yet - the 'apply' status report is only sent after
wal_receiver_status_interval fills up, so you get long delays.Yes it's up to the task, you misread it. It will continue sending
replies while apply<> flush and then it will fall back to the
behaviour you mention.There's nothing to wake up the walreceiver after applying a commit record.
Oh, you're relying on the periodic wakeups specified by NAPTIME_PER_CYCLE
(100ms). That's still on average a 50ms delay to every commit.
Rubbish. "Every commit", no way. How could you think that?
That delay affects only the last commit of a sequence of commits after
which the server goes quiet. And that only applies to people that have
specifically chosen to wait for the apply, which as you say means they
may have fairly long waits anyway.
We should try
to eliminate these polling loops, not make them more important. In fact, we
should raise NAPTIME_PER_CYCLE to, say, 1000ms, to reduce spurious wakeups
on an idle system.
I'm OK with changing the polling loops. Is there a big problem there?
I think it would take you about an hour to rewrite that, if you wanted
to do so. But its not a necessary step to do that, especially when
we're discussing whether Latches are actually as portable as we think.
That sounds like more risk for a slight gain in performance.
Am I reading the patch correctly that if the standby hasn't applied all WAL
yet, you send a reply message at every wakeup, whether or not any progress
has been made since last time? So if you have a long-running-transaction in
the standby, for example, conflicting with WAL recovery, the standby will
keep sending a status report to the master every 100ms.
Sure.
First, is that a problem? Why? The WalReceiver isn't busy doing
anything else at that point, by definition. The WalSender isn't doing
anything then either, by definition. Both are used to higher send
rates.
Second, if that really is a problem, sounds like a simple "if" test
added to reply code.
Third, the conflict issues are much reduced as well in this release.
For this exact purpose.
So I don't see any blockers in what you say.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 28, 2011 at 2:05 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
It would feel at least as logical to control this in the standby.
Now you are being ridiculous. You've spoken strongly against this at
every single step of this journey.
We're well passed the stage of putting anything in that could do that,
as well you know.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 28.03.2011 16:11, Simon Riggs wrote:
On Mon, Mar 28, 2011 at 2:05 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:It would feel at least as logical to control this in the standby.
Now you are being ridiculous. You've spoken strongly against this at
every single step of this journey.
I was thinking specifically about whether flush vs. write (vs. apply,
maybe) here. It would make sense to set that in the standby. You might
even want to set it differently on different standbys.
What I was strongly against is the action at a distance, where setting a
GUC in a standby suddenly makes the master to wait for acks from that
server. That's dangerous, but I don't see such danger in setting the
level of synchronicity in the standby, once you've decided that it's a
synchronous standby.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Mar 28, 2011 at 10:11 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
On 28.03.2011 16:11, Simon Riggs wrote:
On Mon, Mar 28, 2011 at 2:05 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:It would feel at least as logical to control this in the standby.
Now you are being ridiculous. You've spoken strongly against this at
every single step of this journey.I was thinking specifically about whether flush vs. write (vs. apply, maybe)
here. It would make sense to set that in the standby. You might even want to
set it differently on different standbys.What I was strongly against is the action at a distance, where setting a GUC
in a standby suddenly makes the master to wait for acks from that server.
That's dangerous, but I don't see such danger in setting the level of
synchronicity in the standby, once you've decided that it's a synchronous
standby.
It might not be dangerous, but the standby currently sends write,
flush, and apply positions back separately, so the master must decide
which of those to pay attention to, unless we rework the whole design.
I actually think the current design is quite nice and am in no hurry
to rejigger that particular part of it. However, I do think that we
may need or want to rejigger the timing of replies for performance.
It might be, for example, that when waiting for the "write", the
master should somehow indicate to the standby the earliest write-LSN
that could release waiters, so that a standby can send back a reply
the instant it hits that LSN, without waiting to finish reading
everything that's buffered up as it does now. For apply, I agree with
your feeling that the startup process needs to wake walreceiver via a
latch when the apply position advances, but here again it might be
beneficial to send the first interesting LSN from master to standby to
cut down unnecessary wakeups. We also need to consider the issue
raised elsewhere - that a naive implementation of this could allow the
commit to become visible on the standby before it becomes visible on
the master. That would be expensive to prevent, because you'd need an
additional set of master-standby interlocks, but I think at least one
person was arguing that it was necessary for correctness - my memory
of the details is fuzzy at the moment.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Mar 28, 2011 at 3:11 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
On 28.03.2011 16:11, Simon Riggs wrote:
On Mon, Mar 28, 2011 at 2:05 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:It would feel at least as logical to control this in the standby.
Now you are being ridiculous. You've spoken strongly against this at
every single step of this journey.I was thinking specifically about whether flush vs. write (vs. apply, maybe)
here. It would make sense to set that in the standby. You might even want to
set it differently on different standbys.What I was strongly against is the action at a distance, where setting a GUC
in a standby suddenly makes the master to wait for acks from that server.
That's dangerous, but I don't see such danger in setting the level of
synchronicity in the standby, once you've decided that it's a synchronous
standby.
The action at a distance thought still applies, since you would wait
more or less time depending upon how this parameter was set on the
standby.
I can't see how this situation differs. Your own argument still applies.
I would point out that I argued against you, but was persuaded towards
your approach. The code won't easily allow what you suggest. There
were multiple approaches to implementation, but there aren't anymore.
So you can't say the UI is unclear; its very clear how we implement
things from here - the way this patch implements it - on the master.
This is a simple patch, containing functionality already discussed and
agreed and for which code was submitted in this CF.
There's no reason to block this, only for us to decide the final
naming of parameter/s and options.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> wrote:
We also need to consider the issue raised elsewhere - that a naive
implementation of this could allow the commit to become visible on
the standby before it becomes visible on the master. That would
be expensive to prevent, because you'd need an additional set of
master-standby interlocks, but I think at least one person was
arguing that it was necessary for correctness - my memory of the
details is fuzzy at the moment.
I remember expressing concern about that; I don't know if anyone
else did. After some discussion, I was persuaded that the use cases
where it would matter are narrow enough that documentation of the
issue should be enough.
-Kevin