Synchronization levels in SR

Started by Fujii Masaoalmost 16 years ago145 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

Hi,

I'm now designing the "synchronous" replication feature based on
SR for 9.1, while discussing that at another thread.
http://archives.postgresql.org/pgsql-hackers/2010-04/msg01516.php

At the first design phase, I'd like to clarify which synch levels
should be supported 9.1 and how it should be specified by users.

The log-shipping replication has some synch levels as follows.

The transaction commit on the master
#1 doesn't wait for replication (already suppored in 9.0)
#2 waits for WAL to be received by the standby
#3 waits for WAL to be received and flushed by the standby
#4 waits for WAL to be received, flushed and replayed by
the standby
..etc?

Which should we include in 9.1? I'd like to add #2 and #3.
They are enough for high-availability use case (i.e., to
prevent failover from losing any transactions committed).
AFAIR, MySQL semi-synchronous replication supports #2 level.

#4 is useful for some cases, but might often make the
transaction commit on the master get stuck since read-only
query can easily block recovery by the lock conflict. So
#4 seems not to be worth working on until that HS problem
has been addressed. Thought?

Second, we need to discuss about how to specify the synch
level. There are three approaches:

* Per standby
Since the purpose, location and H/W resource often differ
from one standby to another, specifying level per standby
(i.e., we set the level in recovery.conf) is a
straightforward approach, I think. For example, we can
choose #3 for high-availability standby near the master,
and choose #1 (async) for the disaster recovery standby
remote.

* Per transaction
Define the PGC_USERSET option specifying the level and
specify it on the master in response to the purpose of
transaction. In this approach, for example, we can choose
#4 for the transaction which should be visible on the
standby as soon as a "success" of the commit has been
returned to a client. We can also choose #1 for
time-critical but not mission-critical transaction.

* Mix
Allow users to specify the level per standby and
transaction at the same time, and then calculate the real
level from them by using some algorithm.

Which should we adopt for 9.1? I'd like to implement the
"per-standby" approach at first since it's simple and seems
to cover more use cases. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#1)
Re: Synchronization levels in SR

On 24/05/10 16:20, Fujii Masao wrote:

The log-shipping replication has some synch levels as follows.

The transaction commit on the master
#1 doesn't wait for replication (already suppored in 9.0)
#2 waits for WAL to be received by the standby
#3 waits for WAL to be received and flushed by the standby
#4 waits for WAL to be received, flushed and replayed by
the standby
..etc?

Which should we include in 9.1? I'd like to add #2 and #3.
They are enough for high-availability use case (i.e., to
prevent failover from losing any transactions committed).
AFAIR, MySQL semi-synchronous replication supports #2 level.

#4 is useful for some cases, but might often make the
transaction commit on the master get stuck since read-only
query can easily block recovery by the lock conflict. So
#4 seems not to be worth working on until that HS problem
has been addressed. Thought?

I see a lot of value in #4; it makes it possible to distribute read-only
load to the standby using something like pgbouncer, completely
transparently to the application. In the lesser modes, the application
can see slightly stale results.

But whatever we can easily implement, really. Pick one that you think is
the easiest and start with that, but keep the other modes in mind in the
design and in the user interface so that you don't paint yourself in the
corner.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Josh Berkus
josh@agliodbs.com
In reply to: Fujii Masao (#1)
Re: Synchronization levels in SR

#4 is useful for some cases, but might often make the
transaction commit on the master get stuck since read-only
query can easily block recovery by the lock conflict. So
#4 seems not to be worth working on until that HS problem
has been addressed. Thought?

I agree that #4 should be done last, but it will be needed, not in the
least by your employer ;-) . I don't see any obvious way to make #4
compatible with any significant query load on the slave, but in general
I'd think that users of #4 are far more concerned with 0% data loss than
they are with getting the slave to run read queries.

Second, we need to discuss about how to specify the synch
level. There are three approaches:

* Per standby

* Per transaction

Ach, I'm torn. I can see strong use cases for both of the above.
Really, I think:

* Mix
Allow users to specify the level per standby and
transaction at the same time, and then calculate the real
level from them by using some algorithm.

What we should do is specify it per-standby, and then have a USERSET GUC
on the master which specifies which transactions will be synched, and
those will be synched only on the slaves which are set up to support
synch. That is, if you have:

Master
Slave #1: synch
Slave #2: not synch
Slave #3: not synch

And you have:
Session #1: synch
Session #2: not synch

Session #1 will be synched on Slave #1 before commit. Nothing will be
synched on Slaves 2 and 3, and session #2 will not wait for synch on any
slave.

I think this model delivers the maximum HA flexibility to users while
still making intuitive logical sense.

Which should we adopt for 9.1? I'd like to implement the
"per-standby" approach at first since it's simple and seems
to cover more use cases. Thought?

If people agree that the above is our roadmap, implementing
"per-standby" first makes sense, and then we can implement "per-session"
GUC later.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: Synchronization levels in SR

On Tue, May 25, 2010 at 1:18 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I see a lot of value in #4; it makes it possible to distribute read-only
load to the standby using something like pgbouncer, completely transparently
to the application.

Agreed.

In the lesser modes, the application can see slightly
stale results.

Yes

BTW, even if we got #4, we would need to be careful about that
we might see the uncommitted results from the standby. That is,
the transaction commit might become visible in the standby before
the master returns its "success" to a client. I think that we
would never get the completely-transaction-consistent results
from the standby until we have implemented the "snapshot cloning"
feature.
http://wiki.postgresql.org/wiki/ClusterFeatures#Export_snapshots_to_other_sessions

But whatever we can easily implement, really. Pick one that you think is the
easiest and start with that, but keep the other modes in mind in the design
and in the user interface so that you don't paint yourself in the corner.

Yep, the design and implementation for #2 and #3 should be
easily extensible for #4. I'll keep in mind that.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#5Fujii Masao
masao.fujii@gmail.com
In reply to: Josh Berkus (#3)
Re: Synchronization levels in SR

On Tue, May 25, 2010 at 10:29 AM, Josh Berkus <josh@agliodbs.com> wrote:

I agree that #4 should be done last, but it will be needed, not in the
least by your employer ;-) .  I don't see any obvious way to make #4
compatible with any significant query load on the slave, but in general
I'd think that users of #4 are far more concerned with 0% data loss than
they are with getting the slave to run read queries.

Since #2 and #3 are enough for 0% data loss, I think that such users
would be more concerned about what results are visible in the standby.
No?

What we should do is specify it per-standby, and then have a USERSET GUC
on the master which specifies which transactions will be synched, and
those will be synched only on the slaves which are set up to support
synch.  That is, if you have:

Master
Slave #1: synch
Slave #2: not synch
Slave #3: not synch

And you have:
Session #1: synch
Session #2: not synch

Session #1 will be synched on Slave #1 before commit.  Nothing will be
synched on Slaves 2 and 3, and session #2 will not wait for synch on any
slave.

I think this model delivers the maximum HA flexibility to users while
still making intuitive logical sense.

This makes sense.

Since it's relatively easy and simple to implement such a boolean GUC flag
rather than "per-transaction" levels (there are four valid values #1, #2,
#3 and #4), I'll do that.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#1)
Re: Synchronization levels in SR

On Mon, May 24, 2010 at 10:20 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

At the first design phase, I'd like to clarify which synch levels
should be supported 9.1 and how it should be specified by users.

There is another question about synch level:

When should the master wait for replication?

In my current design, the backend waits for replication only at
the end of the transaction commit. Is this enough? Is there other
waiting point?

For example, smart or fast shutdown on the master should wait
for a shutdown checkpoint record to be replicated to the standby
(btw, in 9.0, shutdown waits for checkpoint record to be *sent*)?
pg_switch_xlog() needs to wait for all of original WAL file to
be replicated?

I'm not sure if the above two "waits-for-replication" have use
cases, so I'm thinking they are not worth implementing, but..

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#5)
Re: Synchronization levels in SR

On Tue, 2010-05-25 at 12:40 +0900, Fujii Masao wrote:

On Tue, May 25, 2010 at 10:29 AM, Josh Berkus <josh@agliodbs.com> wrote:

I agree that #4 should be done last, but it will be needed, not in the
least by your employer ;-) . I don't see any obvious way to make #4
compatible with any significant query load on the slave, but in general
I'd think that users of #4 are far more concerned with 0% data loss than
they are with getting the slave to run read queries.

Since #2 and #3 are enough for 0% data loss, I think that such users
would be more concerned about what results are visible in the standby.
No?

Please add #4 also. You can do that easily at the same time as #2 and
#3, and it will leave me free to fix the perceived conflict problems.

--
Simon Riggs www.2ndQuadrant.com

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#1)
Re: Synchronization levels in SR

On Mon, 2010-05-24 at 22:20 +0900, Fujii Masao wrote:

Second, we need to discuss about how to specify the synch
level. There are three approaches:

* Per standby
Since the purpose, location and H/W resource often differ
from one standby to another, specifying level per standby
(i.e., we set the level in recovery.conf) is a
straightforward approach, I think. For example, we can
choose #3 for high-availability standby near the master,
and choose #1 (async) for the disaster recovery standby
remote.

* Per transaction
Define the PGC_USERSET option specifying the level and
specify it on the master in response to the purpose of
transaction. In this approach, for example, we can choose
#4 for the transaction which should be visible on the
standby as soon as a "success" of the commit has been
returned to a client. We can also choose #1 for
time-critical but not mission-critical transaction.

* Mix
Allow users to specify the level per standby and
transaction at the same time, and then calculate the real
level from them by using some algorithm.

Which should we adopt for 9.1? I'd like to implement the
"per-standby" approach at first since it's simple and seems
to cover more use cases. Thought?

-1

Synchronous replication implies that a commit should wait. This wait is
experienced by the transaction, not by other parts of the system. If we
define robustness at the standby level then robustness depends upon
unseen administrators, as well as the current up/down state of standbys.
This is action-at-a-distance in its worst form.

Imagine having 2 standbys, 1 synch, 1 async. If the synch server goes
down, performance will improve and robustness will have been lost. What
good would that be?

Imagine a standby connected over a long distance. DBA brings up standby
in synch mode accidentally and the primary server hits massive
performance problems without any way of directly controlling this.

The worst aspect of standby-level controls is that nobody ever knows how
safe a transaction is. There is no definition or test for us to check
exactly how safe any particular transaction is. Also, the lack of safety
occurs at the time when you least want it - when one of your servers is
already down.

So I call "per-standby" settings simple, and broken in multiple ways.

Putting the control in the hands of the transaction owner (i.e. on the
master) is exactly where the control should be. I personally like the
idea of that being a USERSET, though could live with system wide
settings if need be. But the control must be on the *master* not on the
standbys.

The best parameter we can specify is the number of servers that we wish
to wait for confirmation from. That is a definition that easily manages
the complexity of having various servers up/down at any one time. It
also survives misconfiguration more easily, as well as providing a
workaround if replicating across a bursty network where we can't
guarantee response times, even of the typical response time is good.

(We've discussed this many times before over a period of years and not
really sure why we have to re-discuss this repeatedly just because
people disagree. You don't mention the earlier discussions, not sure
why. If we want to follow the community process, then all previous
discussions need to be taken into account, unless things have changed -
which they haven't: same topic, same people, AFAICS.)

--
Simon Riggs www.2ndQuadrant.com

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#3)
Re: Synchronization levels in SR

On Mon, 2010-05-24 at 18:29 -0700, Josh Berkus wrote:

If people agree that the above is our roadmap, implementing
"per-standby" first makes sense, and then we can implement "per-session"
GUC later.

IMHO "per-standby" sounds simple, but is dangerously simplistic,
explained on another part of the thread.

We need to think clearly about failure modes and how they will be
handled. Failure modes and edge cases completely govern the design here.
"All running smoothly" isn't a major concern and so it appears that the
user interface can be done various ways.

--
Simon Riggs www.2ndQuadrant.com

#10Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#8)
Re: Synchronization levels in SR

On Tue, May 25, 2010 at 12:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Synchronous replication implies that a commit should wait. This wait is
experienced by the transaction, not by other parts of the system. If we
define robustness at the standby level then robustness depends upon
unseen administrators, as well as the current up/down state of standbys.
This is action-at-a-distance in its worst form.

Maybe, but I can't help thinking people are going to want some form of
this. The case where someone wants to do sync rep to the machine in
the next rack over and async rep to a server at a remote site seems
too important to ignore.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#11Joshua D. Drake
jd@commandprompt.com
In reply to: Robert Haas (#10)
Re: Synchronization levels in SR

On Tue, 2010-05-25 at 12:40 -0400, Robert Haas wrote:

On Tue, May 25, 2010 at 12:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Synchronous replication implies that a commit should wait. This wait is
experienced by the transaction, not by other parts of the system. If we
define robustness at the standby level then robustness depends upon
unseen administrators, as well as the current up/down state of standbys.
This is action-at-a-distance in its worst form.

Maybe, but I can't help thinking people are going to want some form of
this. The case where someone wants to do sync rep to the machine in
the next rack over and async rep to a server at a remote site seems
too important to ignore.

Uhh yeah, that is pretty much the standard use case. The "next rack" is
only 50% of the equation. The next part is the disaster recovery rack
over 100Mb (or even 10Mb) that is half way across the country. It is
common, very common.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering

#12MMK
bomuvi@yahoo.com
In reply to: Simon Riggs (#8)
Confused about the buffer pool size

Hello All:
In the code (costsize.c), I see that effective_cache_size is set to DEFAULT_EFFECTIVE_CACHE_SIZE.
This is defined as follows in cost.h
#define DEFAULT_EFFECTIVE_CACHE_SIZE 16384
But when I say 
show shared_buffers in psql I get,
shared_buffers ---------------- 28MB
In postgresql.conf file, the following lines appear
shared_buffers = 28MB                   # min 128kB           # (change requires restart)#temp_buffers = 8MB                     # min 800kB

So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages.
So should effective_cache_size be set to 3584 rather than the 16384?
Thanks,
MMK.

#13Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#10)
Re: Synchronization levels in SR

Robert Haas <robertmhaas@gmail.com> wrote:

Simon Riggs <simon@2ndquadrant.com> wrote:

If we define robustness at the standby level then robustness
depends upon unseen administrators, as well as the current
up/down state of standbys. This is action-at-a-distance in its
worst form.

Maybe, but I can't help thinking people are going to want some
form of this. The case where someone wants to do sync rep to the
machine in the next rack over and async rep to a server at a
remote site seems too important to ignore.

I think there may be a terminology issue here -- I took "configure
by standby" to mean that *at the master* you would specify rules for
each standby. I think Simon took it to mean that each standby would
define the rules for replication to it. Maybe this issue can
resolve gracefully with a bit of clarification?

-Kevin

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#10)
Re: Synchronization levels in SR

On Tue, 2010-05-25 at 12:40 -0400, Robert Haas wrote:

On Tue, May 25, 2010 at 12:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Synchronous replication implies that a commit should wait. This wait is
experienced by the transaction, not by other parts of the system. If we
define robustness at the standby level then robustness depends upon
unseen administrators, as well as the current up/down state of standbys.
This is action-at-a-distance in its worst form.

Maybe, but I can't help thinking people are going to want some form of
this.
The case where someone wants to do sync rep to the machine in
the next rack over and async rep to a server at a remote site seems
too important to ignore.

The use case of "machine in the next rack over and async rep to a server
at a remote site" *is* important, but you give no explanation as to why
that implies "per-standby" is the solution to it.

If you read the rest of my email, you'll see that I have explained the
problems "per-standby" settings would cause.

Please don't be so quick to claim it is me ignoring anything.

--
Simon Riggs www.2ndQuadrant.com

#15Alastair Turner
bell@ctrlf5.co.za
In reply to: Simon Riggs (#8)
Re: Synchronization levels in SR

On Tue, May 25, 2010 at 6:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
.......

The best parameter we can specify is the number of servers that we wish
to wait for confirmation from. That is a definition that easily manages
the complexity of having various servers up/down at any one time. It
also survives misconfiguration more easily, as well as providing a
workaround if replicating across a bursty network where we can't
guarantee response times, even of the typical response time is good.

This may be an incredibly naive question, but what happens to the
transaction on the master if the number of confirmations is not
received? Is this intended to create a situation where the master
effectively becomes unavailable for write operations when its
synchronous slaves are unavailable?

Alastair "Bell" Turner

^F5

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#13)
Re: Synchronization levels in SR

On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

Simon Riggs <simon@2ndquadrant.com> wrote:

If we define robustness at the standby level then robustness
depends upon unseen administrators, as well as the current
up/down state of standbys. This is action-at-a-distance in its
worst form.

Maybe, but I can't help thinking people are going to want some
form of this. The case where someone wants to do sync rep to the
machine in the next rack over and async rep to a server at a
remote site seems too important to ignore.

I think there may be a terminology issue here -- I took "configure
by standby" to mean that *at the master* you would specify rules for
each standby. I think Simon took it to mean that each standby would
define the rules for replication to it. Maybe this issue can
resolve gracefully with a bit of clarification?

The use case of "machine in the next rack over and async rep to a server
at a remote site" would require the settings

server.nextrack = synch
server.remotesite = async

which leaves open the question of what happens when "nextrack" is down.

In many cases, to give adequate performance in that situation people add
an additional server, so the config becomes

server.nextrack1 = synch
server.nextrack2 = synch
server.remotesite = async

We then want to specify for performance reasons that we can get a reply
from either nextrack1 or nextrack2, so it all still works safely and
quickly if one of them is down. How can we express that rule concisely?
With some difficulty.

My suggestion is simply to have a single parameter (name unimportant)

number_of_synch_servers_we_wait_for = N

which is much easier to understand because it is phrased in terms of the
guarantee given to the transaction, not in terms of what the admin
thinks is the situation.

--
Simon Riggs www.2ndQuadrant.com

#17Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#16)
Re: Synchronization levels in SR

On Tue, May 25, 2010 at 1:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

Simon Riggs <simon@2ndquadrant.com> wrote:

If we define robustness at the standby level then robustness
depends upon unseen administrators, as well as the current
up/down state of standbys.  This is action-at-a-distance in its
worst form.

Maybe, but I can't help thinking people are going to want some
form of this.  The case where someone wants to do sync rep to the
machine in the next rack over and async rep to a server at a
remote site seems too important to ignore.

I think there may be a terminology issue here -- I took "configure
by standby" to mean that *at the master* you would specify rules for
each standby.  I think Simon took it to mean that each standby would
define the rules for replication to it.  Maybe this issue can
resolve gracefully with a bit of clarification?

The use case of "machine in the next rack over and async rep to a server
at a remote site" would require the settings

server.nextrack = synch
server.remotesite = async

which leaves open the question of what happens when "nextrack" is down.

In many cases, to give adequate performance in that situation people add
an additional server, so the config becomes

server.nextrack1 = synch
server.nextrack2 = synch
server.remotesite = async

We then want to specify for performance reasons that we can get a reply
from either nextrack1 or nextrack2, so it all still works safely and
quickly if one of them is down. How can we express that rule concisely?
With some difficulty.

Perhaps the difficulty here is that those still look like per-server
settings to me. Just maybe with a different set of semantics.

My suggestion is simply to have a single parameter (name unimportant)

number_of_synch_servers_we_wait_for = N

which is much easier to understand because it is phrased in terms of the
guarantee given to the transaction, not in terms of what the admin
thinks is the situation.

So I agree that we need to talk about whether or not we want to do
this. I'll give my opinion. I am not sure how useful this really is.
Consider a master with two standbys. The master commits a
transaction and waits for one of the two standbys, then acknowledges
the commit back to the user. Then the master crashes. Now what?
It's not immediately obvious which standby we should being online as
the primary, and if we guess wrong we could lose transactions thought
to be committed. This is probably a solvable problem, with enough
work: we can write a script to check the last LSN received by each of
the two standbys and promote whichever one is further along.

But... what happens if the master and one standby BOTH crash
simultaneously? There's no way of knowing (until we get at least one
of them back up) whether it's safe to promote the other standby.

I like the idea of a "quorum commit" type feature where we promise the
user that things are committed when "enough" servers have acknowledged
the commit. But I think most people are not going to want that
configuration unless we also provide some really good management tools
that we don't have today.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#18Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: MMK (#12)
Re: Confused about the buffer pool size

On 25/05/10 19:49, MMK wrote:

Hello All:
In the code (costsize.c), I see that effective_cache_size is set to DEFAULT_EFFECTIVE_CACHE_SIZE.
This is defined as follows in cost.h
#define DEFAULT_EFFECTIVE_CACHE_SIZE 16384
But when I say
show shared_buffers in psql I get,
shared_buffers ---------------- 28MB
In postgresql.conf file, the following lines appear
shared_buffers = 28MB # min 128kB # (change requires restart)#temp_buffers = 8MB # min 800kB

So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages.
So should effective_cache_size be set to 3584 rather than the 16384?

No. Please see the manual for what effective_cache_size means:

http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#17)
Re: Synchronization levels in SR

On Tue, 2010-05-25 at 13:31 -0400, Robert Haas wrote:

On Tue, May 25, 2010 at 1:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

Simon Riggs <simon@2ndquadrant.com> wrote:

If we define robustness at the standby level then robustness
depends upon unseen administrators, as well as the current
up/down state of standbys. This is action-at-a-distance in its
worst form.

Maybe, but I can't help thinking people are going to want some
form of this. The case where someone wants to do sync rep to the
machine in the next rack over and async rep to a server at a
remote site seems too important to ignore.

I think there may be a terminology issue here -- I took "configure
by standby" to mean that *at the master* you would specify rules for
each standby. I think Simon took it to mean that each standby would
define the rules for replication to it. Maybe this issue can
resolve gracefully with a bit of clarification?

The use case of "machine in the next rack over and async rep to a server
at a remote site" would require the settings

server.nextrack = synch
server.remotesite = async

which leaves open the question of what happens when "nextrack" is down.

In many cases, to give adequate performance in that situation people add
an additional server, so the config becomes

server.nextrack1 = synch
server.nextrack2 = synch
server.remotesite = async

We then want to specify for performance reasons that we can get a reply
from either nextrack1 or nextrack2, so it all still works safely and
quickly if one of them is down. How can we express that rule concisely?
With some difficulty.

Perhaps the difficulty here is that those still look like per-server
settings to me. Just maybe with a different set of semantics.

(Those are the per-server settings.)

--
Simon Riggs www.2ndQuadrant.com

#20Simon Riggs
simon@2ndQuadrant.com
In reply to: Alastair Turner (#15)
Re: Synchronization levels in SR

On Tue, 2010-05-25 at 19:08 +0200, Alastair Turner wrote:

On Tue, May 25, 2010 at 6:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
.......

The best parameter we can specify is the number of servers that we wish
to wait for confirmation from. That is a definition that easily manages
the complexity of having various servers up/down at any one time. It
also survives misconfiguration more easily, as well as providing a
workaround if replicating across a bursty network where we can't
guarantee response times, even of the typical response time is good.

This may be an incredibly naive question, but what happens to the
transaction on the master if the number of confirmations is not
received? Is this intended to create a situation where the master
effectively becomes unavailable for write operations when its
synchronous slaves are unavailable?

How we handle degraded mode is important, yes. Whatever parameters we
choose the problem will remain the same.

Should we just ignore degraded mode and respond as if nothing bad had
happened? Most people would say not.

If we specify server1 = synch and server2 = async we then also need to
specify what happens if server1 is down. People might often specify
if (server1 == down) server2 = synch.
So now we have 3 configuration settings, one quite complex.

It's much easier to say you want to wait for N servers to respond, but
don't care which they are. One parameter, simple and flexible.

In both cases, we have to figure what to do if we can't get either
server to respond. In replication there is no such thing as "server
down" just a "server didn't reply in time X". So we need to define
timeouts.

So whatever we do, we need additional parameters to specify timeouts
(including wait-forever as an option) and action-on-timeout: commit or
rollback.

--
Simon Riggs www.2ndQuadrant.com

#21Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#17)
#22Yeb Havinga
yebhavinga@gmail.com
In reply to: Simon Riggs (#20)
#23Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Simon Riggs (#20)
#24MMK
bomuvi@yahoo.com
In reply to: Heikki Linnakangas (#18)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Yeb Havinga (#22)
#26Josh Berkus
josh@agliodbs.com
In reply to: MMK (#24)
#27Florian Pflug
fgp@phlo.org
In reply to: Simon Riggs (#25)
#28Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#16)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#28)
#30Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#7)
#31Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#29)
#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#30)
#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#28)
#34Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#33)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#31)
#36Alastair Turner
bell@ctrlf5.co.za
In reply to: Robert Haas (#35)
#37Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#34)
#38Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#34)
#39Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#35)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#39)
#41Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#40)
#42Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#41)
#43Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#42)
#44Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#40)
#45Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#42)
#46Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#43)
#47Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#41)
#48Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#43)
#49Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#46)
#50Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#49)
#51Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#47)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#44)
#53Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#52)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#53)
#55Joshua D. Drake
jd@commandprompt.com
In reply to: Robert Haas (#54)
#56Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Simon Riggs (#47)
#57Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Dimitri Fontaine (#56)
#58Jan Wieck
JanWieck@Yahoo.com
In reply to: Heikki Linnakangas (#41)
#59Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Dimitri Fontaine (#56)
#60Simon Riggs
simon@2ndQuadrant.com
In reply to: Jan Wieck (#58)
#61Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#57)
#62Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#61)
#63Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#37)
#64Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#63)
#65Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#62)
#66Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#65)
#67Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#38)
#68Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#64)
#69Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Heikki Linnakangas (#59)
#70Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#66)
#71Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#68)
#72Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#67)
#73Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#67)
#74Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#71)
#75Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#74)
#76Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#73)
#77Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#75)
#78Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#67)
#79Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Simon Riggs (#73)
#80Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#77)
#81Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#76)
#82Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#78)
#83Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#80)
#84Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#81)
#85Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#82)
#86Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#85)
#87Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#86)
#88Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#37)
#89Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#59)
#90Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#89)
#91Greg Smith
gsmith@gregsmith.com
In reply to: Heikki Linnakangas (#59)
#92Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Greg Smith (#91)
#93Simon Riggs
simon@2ndQuadrant.com
In reply to: Greg Smith (#91)
#94Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#92)
#95Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#94)
#96Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#89)
#97Robert Haas
robertmhaas@gmail.com
In reply to: Jan Wieck (#96)
#98David Fetter
david@fetter.org
In reply to: Robert Haas (#97)
#99Jan Wieck
JanWieck@Yahoo.com
In reply to: Robert Haas (#97)
#100Robert Haas
robertmhaas@gmail.com
In reply to: Jan Wieck (#99)
#101Jan Wieck
JanWieck@Yahoo.com
In reply to: Robert Haas (#100)
In reply to: Dimitri Fontaine (#79)
#103Fujii Masao
masao.fujii@gmail.com
In reply to: Boszormenyi Zoltan (#102)
In reply to: Fujii Masao (#103)
#105Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Boszormenyi Zoltan (#102)
In reply to: Dimitri Fontaine (#105)
#107Simon Riggs
simon@2ndQuadrant.com
In reply to: Boszormenyi Zoltan (#106)
#108Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#107)
In reply to: Simon Riggs (#107)
#110Simon Riggs
simon@2ndQuadrant.com
In reply to: Boszormenyi Zoltan (#109)
#111Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#110)
#112Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#111)
#113Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#108)
#114Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#110)
#115Markus Wanner
markus@bluegap.ch
In reply to: Robert Haas (#78)
#116Robert Haas
robertmhaas@gmail.com
In reply to: Markus Wanner (#115)
#117Markus Wanner
markus@bluegap.ch
In reply to: Robert Haas (#116)
#118Robert Haas
robertmhaas@gmail.com
In reply to: Markus Wanner (#117)
#119Markus Wanner
markus@bluegap.ch
In reply to: Robert Haas (#118)
#120Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Markus Wanner (#117)
#121Simon Riggs
simon@2ndQuadrant.com
In reply to: Markus Wanner (#119)
#122Tom Lane
tgl@sss.pgh.pa.us
In reply to: Markus Wanner (#119)
#123Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#122)
#124Markus Wanner
markus@bluegap.ch
In reply to: Ron Mayer (#120)
#125Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#123)
#126Markus Wanner
markus@bluegap.ch
In reply to: Tom Lane (#122)
#127Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#121)
#128Markus Wanner
markus@bluegap.ch
In reply to: Robert Haas (#127)
#129Markus Wanner
markus@bluegap.ch
In reply to: Markus Wanner (#126)
#130marcin mank
marcin.mank@gmail.com
In reply to: Tom Lane (#122)
#131Robert Haas
robertmhaas@gmail.com
In reply to: marcin mank (#130)
#132Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#110)
In reply to: Fujii Masao (#132)
#134Fujii Masao
masao.fujii@gmail.com
In reply to: Boszormenyi Zoltan (#133)
In reply to: Fujii Masao (#134)
#136Robert Haas
robertmhaas@gmail.com
In reply to: Boszormenyi Zoltan (#135)
#137Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#136)
#138Markus Wanner
markus@bluegap.ch
In reply to: Boszormenyi Zoltan (#133)
#139Simon Riggs
simon@2ndQuadrant.com
In reply to: Boszormenyi Zoltan (#133)
#140Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#137)
#141Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#140)
#142Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#141)
#143Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#142)
#144David Fetter
david@fetter.org
In reply to: Simon Riggs (#143)
#145Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#143)