Synchronous replication

Started by Fujii Masaoalmost 16 years ago59 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

Hi,

The attached patch provides core of synchronous replication feature
based on streaming replication. I added this patch into CF 2010-07.

The code is also available in my git repository:
git://git.postgresql.org/git/users/fujii/postgres.git
branch: synchrep

Synchronization levels
----------------------
The patch provides replication_mode parameter in recovery.conf, which
specifies the replication mode which can control how long transaction
commit on the master server waits for replication before the command
returns a "success" indication to the client. Valid modes are:

1. async
doesn't make transaction commit wait for replication, i.e.,
asynchronous replication. This mode has been already supported in
9.0.

2. recv
makes transaction commit wait until the standby has received WAL
records.

3. fsync
makes transaction commit wait until the standby has received and
flushed WAL records to disk

4. replay
makes transaction commit wait until the standby has replayed WAL
records after receiving and flushing them to disk

You can choose the synchronization level per standby.

Quorum commit
-------------
In previous discussion about synchronous replication, some people
wanted the quorum commit feature. This feature is included in also
Zontan's synchronous replication patch, so I decided to create it.

The patch provides quorum parameter in postgresql.conf, which
specifies how many standby servers transaction commit will wait for
WAL records to be replicated to, before the command returns a
"success" indication to the client. The default value is zero, which
always doesn't make transaction commit wait for replication without
regard to replication_mode. Also transaction commit always doesn't
wait for replication to asynchronous standby (i.e., replication_mode
is set to async) without regard to this parameter. If quorum is more
than the number of synchronous standbys, transaction commit returns
a "success" when the ACK has arrived from all of synchronous standbys.

Currently quorum parameter is defined as PGC_USERSET. You can have
some transactions replicate synchronously and others asynchronously.

Protocol
--------
I extended the handshake message "START_REPLICATION" so that it
includes replication_mode read from recovery.conf. If 'async' is
passed, the master thinks that it doesn't need to wait for the ACK
from the standby.

I added XLogRecPtr message, which is used to send the ACK meaning
completion of replication from walreceiver to walsender. If
replication_mode = 'async', this message is never sent. XLogRecPtr
message always includes the current receive location if mode is 'recv',
the current flush location if mode is 'fsync' and the current replay
location if mode is 'replay'.

Then, if the location in the ACK is more than or equal to the
location of the COMMIT record, transaction breaks out of the wait-loop
and returns a "success" to the client.

TODO
----
The patch have no features for performance improvement of synchronous
replication. I admit that currently the performance overhead in the
master is terrible. We need to address the following TODO items in the
subsequent CF.

* Change the poll loop in the walsender
* Change the poll loop in the backend
* Change the poll loop in the startup process
* Change the poll loop in the walreceiver
* Perform the WAL write and replication concurrently
* Send WAL from not only disk but also WAL buffers

For the case where the network outage happens or the standby fails, we
should expose the maximum time to wait for replication, as a parameter.
Furthermore you might want to specify the reaction to the timeout. These
are also not in the patch, so we need to address them in the subsequent
CF, too.

In synchronous replication, it's important to check whether the standby
has been sync with the master. But such a monitoring feature is also not
in the patch. That's TODO.

It would be difficult to commit whole of synchronous replication feature
at one time. I'm planning to develop it by stages.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

synch_rep_0714.patchapplication/octet-stream; name=synch_rep_0714.patchDownload+795-299
#2Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#1)
Re: Synchronous replication

On Wed, Jul 14, 2010 at 2:50 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

The patch have no features for performance improvement of synchronous
replication. I admit that currently the performance overhead in the
master is terrible. We need to address the following TODO items in the
subsequent CF.

* Change the poll loop in the walsender
* Change the poll loop in the backend
* Change the poll loop in the startup process
* Change the poll loop in the walreceiver
* Perform the WAL write and replication concurrently
* Send WAL from not only disk but also WAL buffers

I have a feeling that if we don't have a design for these last two
before we start committing things, we're possibly going to regret it
later.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#2)
Re: Synchronous replication

On Thu, Jul 15, 2010 at 12:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 14, 2010 at 2:50 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

The patch have no features for performance improvement of synchronous
replication. I admit that currently the performance overhead in the
master is terrible. We need to address the following TODO items in the
subsequent CF.

* Change the poll loop in the walsender
* Change the poll loop in the backend
* Change the poll loop in the startup process
* Change the poll loop in the walreceiver
* Perform the WAL write and replication concurrently
* Send WAL from not only disk but also WAL buffers

I have a feeling that if we don't have a design for these last two
before we start committing things, we're possibly going to regret it
later.

Yeah, I'll give it a try.

The problem is that the standby can apply the non-fsync'd WAL on the
master. So if we allow walsender to send the non-fsync'd WAL, we should
make walsender send also the current fsync location and prevent the
standby from applying the newer WAL than the fsync location.

New message type for sending the fsync location would be required in
Streaming Replication Protocol. But sometimes it might go along with
XLogData message.

After the master crashes and walreceiver is terminated, currently the
standby attempts to replay the WAL in the pg_xlog and the archive.
Since WAL in the archive is guaranteed to have already been fsync'd by
the master, it's not problem for the standby to apply that WAL. OTOH,
WAL records in pg_xlog directory might not exist in the crashed master.
So we should always prevent the standby from applying any WAL in pg_xlog
unless walreceiver is in progress. That is, if there is no WAL available
in the archive, the standby ignores pg_xlog and starts walreceiver
process to request for WAL streaming.

This idea is a little inefficient because the already-sent WAL might
be sent again when the master is restarted. But since this ensures
that the standby will not apply the non-fsync'd WAL on the master,
it's quite safe.

What about this idea?

This idea doesn't conflict with the patch I submitted for CF 2010-07.
So please feel free to review the patch :) But if you think that the
patch is not reviewable until that idea has been implemented, I'll
try to implement that ASAP.

PS. Probably I cannot reply to the mail until July 21. Sorry.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#3)
Re: Synchronous replication

On 16/07/10 10:40, Fujii Masao wrote:

So we should always prevent the standby from applying any WAL in pg_xlog
unless walreceiver is in progress. That is, if there is no WAL available
in the archive, the standby ignores pg_xlog and starts walreceiver
process to request for WAL streaming.

That completely defeats the purpose of storing streamed WAL in pg_xlog
in the first place. The reason it's written and fsync'd to pg_xlog is
that if the standby subsequently crashes, you can use the WAL from
pg_xlog to reapply the WAL up to minRecoveryPoint. Otherwise you can't
start up the standby anymore.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#5Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Heikki Linnakangas (#4)
Re: Synchronous replication

Le 16 juil. 2010 à 12:43, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> a écrit :

On 16/07/10 10:40, Fujii Masao wrote:

So we should always prevent the standby from applying any WAL in pg_xlog
unless walreceiver is in progress. That is, if there is no WAL available
in the archive, the standby ignores pg_xlog and starts walreceiver
process to request for WAL streaming.

That completely defeats the purpose of storing streamed WAL in pg_xlog in the first place. The reason it's written and fsync'd to pg_xlog is that if the standby subsequently crashes, you can use the WAL from pg_xlog to reapply the WAL up to minRecoveryPoint. Otherwise you can't start up the standby anymore.

I guess we know for sure that this point has been fsync()ed on the Master, or that we could arrange it so that we know that?

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Dimitri Fontaine (#5)
Re: Synchronous replication

On 16/07/10 20:26, Dimitri Fontaine wrote:

Le 16 juil. 2010 à 12:43, Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> a écrit :

On 16/07/10 10:40, Fujii Masao wrote:

So we should always prevent the standby from applying any WAL in pg_xlog
unless walreceiver is in progress. That is, if there is no WAL available
in the archive, the standby ignores pg_xlog and starts walreceiver
process to request for WAL streaming.

That completely defeats the purpose of storing streamed WAL in pg_xlog in the first place. The reason it's written and fsync'd to pg_xlog is that if the standby subsequently crashes, you can use the WAL from pg_xlog to reapply the WAL up to minRecoveryPoint. Otherwise you can't start up the standby anymore.

I guess we know for sure that this point has been fsync()ed on the Master, or that we could arrange it so that we know that?

At the moment we only stream WAL that's already been fsync()ed on the
master, so we don't have this problem, but Fujii is proposing to change
that.

I think that's a premature optimization, and we should not try to change
that. There is no evidence from field (granted, streaming replication is
a new feature) or from performance tests that it is a problem in
practice, or that sending WAL earlier would help. Let's concentrate on
the bare minimum required to make synchronous replication work.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#1)
Re: Synchronous replication

On 14/07/10 09:50, Fujii Masao wrote:

TODO
----
The patch have no features for performance improvement of synchronous
replication. I admit that currently the performance overhead in the
master is terrible. We need to address the following TODO items in the
subsequent CF.

* Change the poll loop in the walsender
* Change the poll loop in the backend
* Change the poll loop in the startup process
* Change the poll loop in the walreceiver

I was actually hoping to see a patch for these things first, before any
of the synchronous replication stuff. Eliminating the polling loops is
important, latency will be laughable otherwise, and it will help the
synchronous case too.

* Perform the WAL write and replication concurrently
* Send WAL from not only disk but also WAL buffers

IMHO these are premature optimizations that we should not spend any
effort on now. Maybe later, if ever.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#1)
Re: Synchronous replication

On 14/07/10 09:50, Fujii Masao wrote:

Quorum commit
-------------
In previous discussion about synchronous replication, some people
wanted the quorum commit feature. This feature is included in also
Zontan's synchronous replication patch, so I decided to create it.

The patch provides quorum parameter in postgresql.conf, which
specifies how many standby servers transaction commit will wait for
WAL records to be replicated to, before the command returns a
"success" indication to the client. The default value is zero, which
always doesn't make transaction commit wait for replication without
regard to replication_mode. Also transaction commit always doesn't
wait for replication to asynchronous standby (i.e., replication_mode
is set to async) without regard to this parameter. If quorum is more
than the number of synchronous standbys, transaction commit returns
a "success" when the ACK has arrived from all of synchronous standbys.

There should be a way to specify "wait for *all* connected standby
servers to acknowledge"

Protocol
--------
I extended the handshake message "START_REPLICATION" so that it
includes replication_mode read from recovery.conf. If 'async' is
passed, the master thinks that it doesn't need to wait for the ACK
from the standby.

Please use self-explanatory names for the modes in START_REPLICATION
command, instead of just an integer.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#4)
Re: Synchronous replication

On Fri, Jul 16, 2010 at 7:43 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 16/07/10 10:40, Fujii Masao wrote:

So we should always prevent the standby from applying any WAL in pg_xlog
unless walreceiver is in progress. That is, if there is no WAL available
in the archive, the standby ignores pg_xlog and starts walreceiver
process to request for WAL streaming.

That completely defeats the purpose of storing streamed WAL in pg_xlog in
the first place. The reason it's written and fsync'd to pg_xlog is that if
the standby subsequently crashes, you can use the WAL from pg_xlog to
reapply the WAL up to minRecoveryPoint. Otherwise you can't start up the
standby anymore.

But, the standby can start up by reading the missing WAL files from the
master. No?

On the second thought, minRecoveryPoint can be guaranteed to be older
than the fsync location on the master if we'll prevent the standby from
applying the WAL files more than the fsync location. So we can safely
apply the WAL files in pg_xlog up to minRecoveryPoint.

Consequently, we should always prevent the standby from applying any
newer WAL in pg_xlog than minRecoveryPoint unless walreceiver is in
progress. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#10Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#7)
Re: Synchronous replication

On Sat, Jul 17, 2010 at 3:25 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 14/07/10 09:50, Fujii Masao wrote:

TODO
----
The patch have no features for performance improvement of synchronous
replication. I admit that currently the performance overhead in the
master is terrible. We need to address the following TODO items in the
subsequent CF.

* Change the poll loop in the walsender
* Change the poll loop in the backend
* Change the poll loop in the startup process
* Change the poll loop in the walreceiver

I was actually hoping to see a patch for these things first, before any of
the synchronous replication stuff. Eliminating the polling loops is
important, latency will be laughable otherwise, and it will help the
synchronous case too.

At first, note that the poll loop in the backend and walreceiver doesn't
exist without synchronous replication stuff.

Yeah, I'll start with the change of the poll loop in the walsender. I'm
thinking that we should make the backend signal the walsender to send the
outstanding WAL immediately as the previous synchronous replication patch
I submitted in the past year did. I use the signal here because walsender
needs to wait for the request from the backend and the ack message from
the standby *concurrently* in synchronous replication. If we use the
semaphore instead of the signal, the walsender would not be able to
respond the ack immediately, which also degrades the performance.

The problem of this idea is that signal can be sent per transaction commit.
I'm not sure if this frequent signaling really harms the performance of
replication. BTW, when I benchmarked the previous synchronous replication
patch based on the idea, AFAIR the result showed no impact of the
signaling. But... Thought? Do you have another better idea?

* Perform the WAL write and replication concurrently
* Send WAL from not only disk but also WAL buffers

IMHO these are premature optimizations that we should not spend any effort
on now. Maybe later, if ever.

Yep!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#11Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#8)
Re: Synchronous replication

On Sun, Jul 18, 2010 at 3:14 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 14/07/10 09:50, Fujii Masao wrote:

Quorum commit
-------------
In previous discussion about synchronous replication, some people
wanted the quorum commit feature. This feature is included in also
Zontan's synchronous replication patch, so I decided to create it.

The patch provides quorum parameter in postgresql.conf, which
specifies how many standby servers transaction commit will wait for
WAL records to be replicated to, before the command returns a
"success" indication to the client. The default value is zero, which
always doesn't make transaction commit wait for replication without
regard to replication_mode. Also transaction commit always doesn't
wait for replication to asynchronous standby (i.e., replication_mode
is set to async) without regard to this parameter. If quorum is more
than the number of synchronous standbys, transaction commit returns
a "success" when the ACK has arrived from all of synchronous standbys.

There should be a way to specify "wait for *all* connected standby servers
to acknowledge"

Agreed. I'll allow -1 as the valid value of the quorum parameter, which
means that transaction commit waits for all connected standbys.

Protocol
--------
I extended the handshake message "START_REPLICATION" so that it
includes replication_mode read from recovery.conf. If 'async' is
passed, the master thinks that it doesn't need to wait for the ACK
from the standby.

Please use self-explanatory names for the modes in START_REPLICATION
command, instead of just an integer.

Agreed. What about changing the START_REPLICATION message to?:

START_REPLICATION XXX/XXX SYNC_LEVEL { async | recv | fsync | replay }

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#12Aidan Van Dyk
aidan@highrise.ca
In reply to: Fujii Masao (#11)
Re: Synchronous replication

* Fujii Masao <masao.fujii@gmail.com> [100721 03:49]:

The patch provides quorum parameter in postgresql.conf, which
specifies how many standby servers transaction commit will wait for
WAL records to be replicated to, before the command returns a
"success" indication to the client. The default value is zero, which
always doesn't make transaction commit wait for replication without
regard to replication_mode. Also transaction commit always doesn't
wait for replication to asynchronous standby (i.e., replication_mode
is set to async) without regard to this parameter. If quorum is more
than the number of synchronous standbys, transaction commit returns
a "success" when the ACK has arrived from all of synchronous standbys.

There should be a way to specify "wait for *all* connected standby servers
to acknowledge"

Agreed. I'll allow -1 as the valid value of the quorum parameter, which
means that transaction commit waits for all connected standbys.

Hm... so if my 1 synchronouse standby is operatign normally, and quarum
is set to 1, I'll get what I want (commit waits until it's safely on both
servers). But what happens if my standby goes bad. Suddenly the quarum
setting is ignored (because it's > number of connected standby
servers?) Is there a way for me to not allow any commits if the quarum
setting number of standbies is *not* availble? Yes, I want my db to
"halt" in that situation, and yes, alarmbells will be ringing...

In reality, I'm likely to run 2 synchronous slaves, with quarum of 1.
So 1 slave can fail an dI can still have 2 going. But if that 2nd slave
ever failed while the other was down, I definately don't want the master
to forge on ahead!

Of course, this won't be for everyone, just as the current "just
connected standbys" isn't for everything either...

a.

--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

#13Fujii Masao
masao.fujii@gmail.com
In reply to: Aidan Van Dyk (#12)
Re: Synchronous replication

On Wed, Jul 21, 2010 at 9:52 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:

* Fujii Masao <masao.fujii@gmail.com> [100721 03:49]:

The patch provides quorum parameter in postgresql.conf, which
specifies how many standby servers transaction commit will wait for
WAL records to be replicated to, before the command returns a
"success" indication to the client. The default value is zero, which
always doesn't make transaction commit wait for replication without
regard to replication_mode. Also transaction commit always doesn't
wait for replication to asynchronous standby (i.e., replication_mode
is set to async) without regard to this parameter. If quorum is more
than the number of synchronous standbys, transaction commit returns
a "success" when the ACK has arrived from all of synchronous standbys.

There should be a way to specify "wait for *all* connected standby servers
to acknowledge"

Agreed. I'll allow -1 as the valid value of the quorum parameter, which
means that transaction commit waits for all connected standbys.

Hm... so if my 1 synchronouse standby is operatign normally, and quarum
is set to 1, I'll get what I want (commit waits until it's safely on both
servers).  But what happens if my standby goes bad.  Suddenly the quarum
setting is ignored (because it's > number of connected standby
servers?)  Is there a way for me to not allow any commits if the quarum
setting number of standbies is *not* availble?  Yes, I want my db to
"halt" in that situation, and yes, alarmbells will be ringing...

In reality, I'm likely to run 2 synchronous slaves, with quarum of 1.
So 1 slave can fail an dI can still have 2 going.  But if that 2nd slave
ever failed while the other was down, I definately don't want the master
to forge on ahead!

Of course, this won't be for everyone, just as the current "just
connected standbys" isn't for everything either...

Yeah, we need to clear up the detailed design of quorum commit feature,
and reach consensus on that.

How should the synchronous replication behave when the number of connected
standby servers is less than quorum?

1. Ignore quorum. The current patch adopts this. If the ACKs from all
connected standbys have arrived, transaction commit is successful
even if the number of standbys is less than quorum. If there is no
connected standby, transaction commit always is successful without
regard to quorum.

2. Observe quorum. Aidan wants this. Until the number of connected
standbys has become more than or equal to quorum, transaction commit
waits.

Which is the right behavior of quorum commit? Or we should add new
parameter specifying the behavior of quorum commit?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#14Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#11)
Re: Synchronous replication

On Wed, Jul 21, 2010 at 4:48 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

There should be a way to specify "wait for *all* connected standby servers
to acknowledge"

Agreed. I'll allow -1 as the valid value of the quorum parameter, which
means that transaction commit waits for all connected standbys.

Done.

Please use self-explanatory names for the modes in START_REPLICATION
command, instead of just an integer.

Agreed. What about changing the START_REPLICATION message to?:

   START_REPLICATION XXX/XXX SYNC_LEVEL { async | recv | fsync | replay }

Done.

I attached the updated version of the patch.
The code is also available in my git repository:
git://git.postgresql.org/git/users/fujii/postgres.git
branch: synchrep

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

synch_rep_0722.patchapplication/octet-stream; name=synch_rep_0722.patchDownload+822-299
#15Yeb Havinga
yebhavinga@gmail.com
In reply to: Fujii Masao (#13)
Re: Synchronous replication

Fujii Masao wrote:

How should the synchronous replication behave when the number of connected
standby servers is less than quorum?

1. Ignore quorum. The current patch adopts this. If the ACKs from all
connected standbys have arrived, transaction commit is successful
even if the number of standbys is less than quorum. If there is no
connected standby, transaction commit always is successful without
regard to quorum.

2. Observe quorum. Aidan wants this. Until the number of connected
standbys has become more than or equal to quorum, transaction commit
waits.

Which is the right behavior of quorum commit? Or we should add new
parameter specifying the behavior of quorum commit?

Initially I also expected the quorum to behave like described by
Aidan/option 2. Also, IMHO the name "quorom" is a bit short, like having
"maximum" but not saying a max_something.

quorum_min_sync_standbys
quorum_max_sync_standbys

The question remains what are the sync standbys? Does it mean not-async?
Intuitively by looking at the enumeration of replication_mode I'd think
that the sync standbys are all standby's that operate in a not async
mode. That would be clearer with a boolean sync (or not) and for sync
standbys the replication_mode specified.

regards,
Yeb Havinga

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Yeb Havinga (#15)
Re: Synchronous replication

On Thu, Jul 22, 2010 at 5:37 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:

Fujii Masao wrote:

How should the synchronous replication behave when the number of connected
standby servers is less than quorum?

1. Ignore quorum. The current patch adopts this. If the ACKs from all
  connected standbys have arrived, transaction commit is successful
  even if the number of standbys is less than quorum. If there is no
  connected standby, transaction commit always is successful without
  regard to quorum.

2. Observe quorum. Aidan wants this. Until the number of connected
  standbys has become more than or equal to quorum, transaction commit
  waits.

Which is the right behavior of quorum commit? Or we should add new
parameter specifying the behavior of quorum commit?

Initially I also expected the quorum to behave like described by
Aidan/option 2.

OK. But some people (including me) would like to prevent the master
from halting when the standby fails, so I think that 1. also should
be supported. So I'm inclined to add new parameter specifying the
behavior of quorum commit when the number of synchronous standbys
becomes less than quorum.

Also, IMHO the name "quorom" is a bit short, like having
"maximum" but not saying a max_something.

quorum_min_sync_standbys
quorum_max_sync_standbys

What about quorum_standbys?

The question remains what are the sync standbys? Does it mean not-async?

It's the standby which sets replication_mode to "recv", "fsync", or "replay".

Intuitively by looking at the enumeration of replication_mode I'd think that
the sync standbys are all standby's that operate in a not async mode. That
would be clearer with a boolean sync (or not) and for sync standbys the
replication_mode specified.

You mean that something like synchronous_replication as the recovery.conf
parameter should be added in addition to replication_mode? Since increasing
the number of similar parameters would confuse users, I don't like do that.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#17Yeb Havinga
yebhavinga@gmail.com
In reply to: Fujii Masao (#16)
Re: Synchronous replication

Fujii Masao wrote:

Intuitively by looking at the enumeration of replication_mode I'd think that
the sync standbys are all standby's that operate in a not async mode. That
would be clearer with a boolean sync (or not) and for sync standbys the
replication_mode specified.

You mean that something like synchronous_replication as the recovery.conf
parameter should be added in addition to replication_mode? Since increasing
the number of similar parameters would confuse users, I don't like do that.

I think what would be confusing if there is a mismatch between
implemented concepts and parameters.

1 does the master wait for standby servers on commit?
2 how many acknowledgements must the master receive before it can continue?
3 is a standby server a synchronous one, i.e. does it acknowledge a commit?
4 when do standby servers acknowledge a commit?
5 does it only wait when the standby's are connected, or also when they
are not connected?
6..?

When trying to match parameter names for the concepts above:
1 - does not exist, but can be answered with quorum_standbys = 0
2 - quorum_standbys
3 - yes, if replication_mode != async (here is were I thought I had to
think to much)
4 - replication modes recv, fsync and replay bot not async
5 - Zoltan's strict_sync_replication parameter

Just an idea, what about
for 4: acknowledge_commit = {no|recv|fsync|replay}
then 3 = yes, if acknowledge_commit != no

regards,
Yeb Havinga

#18Fujii Masao
masao.fujii@gmail.com
In reply to: Yeb Havinga (#17)
Re: Synchronous replication

On Mon, Jul 26, 2010 at 5:27 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:

Fujii Masao wrote:

Intuitively by looking at the enumeration of replication_mode I'd think
that
the sync standbys are all standby's that operate in a not async mode.
That
would be clearer with a boolean sync (or not) and for sync standbys the
replication_mode specified.

You mean that something like synchronous_replication as the recovery.conf
parameter should be added in addition to replication_mode? Since
increasing
the number of similar parameters would confuse users, I don't like do
that.

I think what would be confusing if there is a mismatch between implemented
concepts and parameters.

1 does the master wait for standby servers on commit?
2 how many acknowledgements must the master receive before it can continue?
3 is a standby server a synchronous one, i.e. does it acknowledge a commit?
4 when do standby servers acknowledge a commit?
5 does it only wait when the standby's are connected, or also when they are
not connected?
6..?

When trying to match parameter names for the concepts above:
1 - does not exist, but can be answered with quorum_standbys = 0
2 - quorum_standbys
3 - yes, if replication_mode != async (here is were I thought I had to think
to much)
4 - replication modes recv, fsync and replay bot not async
5 - Zoltan's strict_sync_replication parameter

Just an idea, what about
for 4: acknowledge_commit = {no|recv|fsync|replay}
then 3 = yes, if acknowledge_commit != no

Thanks for the clarification.

I still like

replication_mode = {async|recv|fsync|replay}

rather than

synchronous_replication = {on|off}
acknowledge_commit = {no|recv|fsync|replay}

because the former is more intuitive for me and I don't want
to increase the number of parameters.

We need to hear from some users in this respect. If most want
the latter, of course, I'd love to adopt it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#19Yeb Havinga
yebhavinga@gmail.com
In reply to: Fujii Masao (#18)
Re: Synchronous replication

Fujii Masao wrote:

I still like

replication_mode = {async|recv|fsync|replay}

rather than

synchronous_replication = {on|off}
acknowledge_commit = {no|recv|fsync|replay}

Hello Fujii,

I wasn't entirely clear. My suggestion was to have only

acknowledge_commit = {no|recv|fsync|replay}

instead of

replication_mode = {async|recv|fsync|replay}

regards,
Yeb Havinga

#20Fujii Masao
masao.fujii@gmail.com
In reply to: Yeb Havinga (#19)
Re: Synchronous replication

On Mon, Jul 26, 2010 at 6:36 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:

Fujii Masao wrote:

I still like

   replication_mode = {async|recv|fsync|replay}

rather than

   synchronous_replication = {on|off}
   acknowledge_commit = {no|recv|fsync|replay}

Hello Fujii,

I wasn't entirely clear. My suggestion was to have only

  acknowledge_commit = {no|recv|fsync|replay}

instead of

  replication_mode = {async|recv|fsync|replay}

Okay, I'll change the patch accordingly.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#21Marko Tiikkaja
marko@joh.to
In reply to: Fujii Masao (#20)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Marko Tiikkaja (#21)
#23Joshua Tolley
eggyknap@gmail.com
In reply to: Yeb Havinga (#15)
#24Fujii Masao
masao.fujii@gmail.com
In reply to: Joshua Tolley (#23)
#25Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#22)
#26Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#10)
#27Yeb Havinga
yebhavinga@gmail.com
In reply to: Fujii Masao (#25)
#28Yeb Havinga
yebhavinga@gmail.com
In reply to: Joshua Tolley (#23)
#29Yeb Havinga
yebhavinga@gmail.com
In reply to: Fujii Masao (#26)
#30Fujii Masao
masao.fujii@gmail.com
In reply to: Yeb Havinga (#29)
#31Fujii Masao
masao.fujii@gmail.com
In reply to: Yeb Havinga (#27)
#32Yeb Havinga
yebhavinga@gmail.com
In reply to: Fujii Masao (#30)
#33Joshua Tolley
eggyknap@gmail.com
In reply to: Fujii Masao (#24)
#34Fujii Masao
masao.fujii@gmail.com
In reply to: Yeb Havinga (#32)
#35Fujii Masao
masao.fujii@gmail.com
In reply to: Joshua Tolley (#33)
#36Joshua Tolley
eggyknap@gmail.com
In reply to: Fujii Masao (#35)
#37Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Joshua Tolley (#33)
#38Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Joshua Tolley (#33)
#39Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#38)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#39)
#41Fujii Masao
masao.fujii@gmail.com
In reply to: Bruce Momjian (#39)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#41)
#43Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#38)
#44Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#40)
#45Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#44)
#46Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#46)
#48Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#47)
#49Yeb Havinga
yebhavinga@gmail.com
In reply to: Fujii Masao (#46)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Yeb Havinga (#49)
#51Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#30)
#52Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#43)
#53Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#51)
#54Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#52)
#55Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#54)
#56Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#39)
#57Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#53)
#58Bruce Momjian
bruce@momjian.us
In reply to: Fujii Masao (#54)
#59Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#53)