Sync Rep Design
I've mulled over the design for sync rep for awhile now, and have come
up with a feature set that includes the final detailed feedback from
Fujii Masao, Aidan Van Dyk, Josh Berkus and others.
The design also draws from MySQL concepts to make the two interfaces as
similar and as simple as possible. It should be noted that the design
presented here has many features that the MySQL design does not.
I am currently finishing up my patch to offer these features, so its
time to begin final discussions.
As an interim step, I enclose a PDF version of relevant excerpts from
the doc patch. The patch will follow on a later post in the near future.
I would like to separate discussions on "user interface" from that of
internal design, to make it easier for more people to get involved.
Please read the following and post your comments. Thank you.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
Attachments:
On 12/30/2010 06:26 PM, Simon Riggs wrote:
I've mulled over the design for sync rep for awhile now, and have come
up with a feature set that includes the final detailed feedback from
Fujii Masao, Aidan Van Dyk, Josh Berkus and others.The design also draws from MySQL concepts to make the two interfaces as
similar and as simple as possible. It should be noted that the design
presented here has many features that the MySQL design does not.I am currently finishing up my patch to offer these features, so its
time to begin final discussions.As an interim step, I enclose a PDF version of relevant excerpts from
the doc patch. The patch will follow on a later post in the near future.I would like to separate discussions on "user interface" from that of
internal design, to make it easier for more people to get involved.
Please read the following and post your comments. Thank you.
it would help if this would just be a simple text-only description of
the design that people can actually comment on inline. I don't think
sending technical design proposals as a pdf (which seems to be written
in doc-style as well) is a good idea to encourage discussion on -hackers :(
Stefan
On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote:
it would help if this would just be a simple text-only description of
the design that people can actually comment on inline. I don't think
sending technical design proposals as a pdf (which seems to be written
in doc-style as well) is a good idea to encourage discussion on -hackers :(
25.2.6. Synchronous Replication
Streaming replication is by default asynchronous. Transactions on the
primary server write commit records to WAL, yet do not know whether or
when a standby has received and processed those changes. So with
asynchronous replication, if the primary crashes, transactions committed
on the primary might not have been received by any standby. As a result,
failover from primary to standby could cause data loss because
transaction completions are absent, relative to the primary. The amount
of data loss is proportional to the replication delay at the time of
failover.
Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication.
When synchronous replication is requested, the commit of a write
transaction will wait until confirmation that the commit record has been
transferred successfully to at least one standby server. Waiting for
confirmation increases the user's confidence that the changes will not
be lost in the event of server crashes but it also necessarily increases
the response time for the requesting transaction. The minimum wait time
is the roundtrip time from primary to standby.
Read only transactions and transaction rollbacks need not wait for
replies from standby servers. Subtransaction commits do not wait for
responses from standby servers, only final top-level commits. Long
running actions such as data loading or index building do not wait until
the very final commit message.
25.2.6.1. Basic Configuration
Synchronous replication must be enabled on both the primary and at least
one standby server. If synchronous replication is disabled on the
master, or enabled on the primary but not enabled on any slaves, the
primary will use asynchronous replication by default.
We use a single parameter to enable synchronous replication, set in
postgresql.conf on both primary and standby servers:
synchronous_replication = off (default) | on
On the primary, synchronous_replication can be set for particular users
or databases, or dynamically by applications programs.
If more than one standby server specifies synchronous_replication, then
whichever standby replies first will release waiting commits.
Turning this setting off for a standby allows the administrator to
exclude certain standby servers from releasing waiting transactions.
This is useful if not all standby servers are designated as potential
future primary servers. On the standby, this parameter only takes effect
at server start.
25.2.6.2. Planning for Performance
Synchronous replication usually requires carefully planned and placed
standby servers to ensure applications perform acceptably. Waiting
doesn't utilise system resources, but transaction locks continue to be
held until the transfer is confirmed. As a result, incautious use of
synchronous replication will reduce performance for database
applications because of increased response times and higher contention.
PostgreSQL allows the application developer to specify the durability
level required via replication. This can be specified for the system
overall, though it can also be specified for specific users or
connections, or even individual transactions.
For example, an application workload might consist of: 10% of changes
are important customer details, while 90% of changes are less important
data that the business can more easily survive if it is lost, such as
chat messages between users.
With synchronous replication options specified at the application level
(on the master) we can offer sync rep for the most important changes,
without slowing down the bulk of the total workload. Application level
options are an important and practical tool for allowing the benefits of
synchronous replication for high performance applications. This feature
is unique to PostgreSQL.
25.2.6.3. Planning for High Availability
The easiest and safest method of gaining High Availability using
synchronous replication is to configure at least two standby servers. To
understand why, we need to examine what can happen when you lose all
standby servers.
Commits made when synchronous_replication is set will wait until at
least one standby responds. The response may never occur if the last, or
only, standby should crash or the network drops. What should we do in
that situation?
Sitting and waiting will typically cause operational problems because it
is an effective outage of the primary server. Allowing the primary
server to continue processing in the absence of a standby puts those
latest data changes at risk. How we handle this situation is controlled
by allow_standalone_primary. The default setting is on, allowing
processing to continue, though there is no recommended setting. Choosing
the best setting for allow_standalone_primary is a difficult decision
and best left to those with combined business responsibility for both
data and applications. The difficulty of this choice is the reason why
we recommend that you reduce the possibility of this situation occurring
by using multiple standby servers.
When the primary is started with allow_standalone_primary enabled, the
primary will not allow connections until a standby connects that also
has synchronous_replication enabled. This is a convenience to ensure
that we don't allow connections before write transactions will return
successfully.
When allow_standalone_primary is set, a user will stop waiting once the
replication_timeout has been reached for their specific session. Users
are not waiting for a specific standby to reply, they are waiting for a
reply from any standby, so the unavailability of any one standby is not
significant to a user. It is possible for user sessions to hit timeout
even though standbys are communicating normally. In that case, the
setting of replication_timeout is probably too low.
The standby sends regular status messages to the primary. If no status
messages have been received for replication_timeout the primary server
will assume the connection is dead and terminate it. This happens
whatever the setting of allow_standalone_primary.
If primary crashes while commits are waiting for acknowledgement, those
transactions will be marked fully committed if the primary database
recovers, no matter how allow_standalone_primary is set. There is no way
to be certain that all standbys have received all outstanding WAL data
at time of the crash of the primary. Some transactions may not show as
committed on the standby, even though they show as committed on the
primary. The guarantee we offer is that the application will not receive
explicit acknowledgement of the successful commit of a transaction until
the WAL data is known to be safely received by the standby. Hence this
mechanism is technically "semi synchronous" rather than "fully
synchronous" replication. Note that replication still not be fully
synchronous even if we wait for all standby servers, though this would
reduce availability, as described previously.
If you need to re-create a standby server while transactions are
waiting, make sure that the commands to run pg_start_backup() and
pg_stop_backup() are run in a session with synchronous_replication =
off, otherwise those requests will wait forever for the standby to
appear.
18.5.5. Synchronous Replication
These settings control the behavior of the built-in synchronous
replication feature. These parameters would be set on the primary server
that is to send replication data to one or more standby servers.
synchronous_replication (boolean)
Specifies whether transaction commit will wait for WAL records
to be replicated before the command returns a "success"
indication to the client. The default setting is off. When on,
there will be a delay while the client waits for confirmation of
successful replication. That delay will increase depending upon
the physical distance and network activity between primary and
standby. The commit wait will last until the first reply from
any standby. Multiple standby servers allow increased
availability and possibly increase performance as well.
The parameter must be set on both primary and standby.
On the primary, this parameter can be changed at any time; the
behavior for any one transaction is determined by the setting in
effect when it commits. It is therefore possible, and useful, to
have some transactions replicate synchronously and others
asynchronously. For example, to make a single multistatement
transaction commit asynchronously when the default is
synchronous replication, issue SET LOCAL synchronous_replication
TO OFF within the transaction.
On the standby, the parameter value is taken only at server
start.
synchronous_replication_timeout (boolean)
If the client has synchronous_replication set, and
allow_standalone_primary is also set, then the commit will wait
for up to synchronous_replication_timeout milliseconds before it
returns a "success", or will wait forever if
synchronous_replication_timeout is set to -1.
If a standby server does not reply for
synchronous_replication_timeout the primary will terminate the
replication connection.
allow_standalone_primary (boolean)
If allow_standalone_primary is not set, then the server will not
allow connections until a standby connects that has
synchronous_replication enabled.
allow_standalone_primary also affects the behaviour when the
synchronous_replication_timeout is reached.
25.5.2. Handling query conflicts
….
Remedial possibilities exist if the number of standby-query
cancellations is found to be unacceptable. Typically the best option is
to enable hot_standby_feedback. This prevents VACUUM from removing
recently-dead rows and so cleanup conflicts do not occur. If you do
this, you should note that this will delay cleanup of dead rows on the
primary, which may result in undesirable table bloat. However, the
cleanup situation will be no worse than if the standby queries were
running directly on the primary server. You are still getting the
benefit of off-loading execution onto the standby and the query may
complete faster than it would have done on the primary server.
max_standby_archive_delay must be kept large in this case, because
delayed WAL files might already contain entries that conflict with the
desired standby queries.
…
18.5.6. Standby Servers
These settings control the behavior of a standby server that is to
receive replication data.
hot_standby (boolean)
Specifies whether or not you can connect and run queries during
recovery, as described in Section 25.5. The default value is
off. This parameter can only be set at server start. It only has
effect during archive recovery or in standby mode.
hot_standby_feedback (boolean)
Specifies whether or not a hot standby will send feedback to the
primary about queries currently executing on the standby. This
parameter can be used to eliminate query cancels caused by
cleanup records, though it can cause database bloat on the
primary for some workloads. The default value is off. This
parameter can only be set at server start. It only has effect if
hot_standby is enabled.
….
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
Attachments:
sync_rep_docs.v6.patchtext/x-patch; charset=UTF-8; name=sync_rep_docs.v6.patchDownload+339-21
On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote:
it would help if this would just be a simple text-only description of
the design that people can actually comment on inline. I don't think
sending technical design proposals as a pdf (which seems to be written
in doc-style as well) is a good idea to encourage discussion on -hackers:(
25.2.6. Synchronous Replication
Streaming replication is by default asynchronous. Transactions on the
primary server write commit records to WAL, yet do not know whether or
when a standby has received and processed those changes. So with
asynchronous replication, if the primary crashes, transactions committed
on the primary might not have been received by any standby. As a result,
failover from primary to standby could cause data loss because
transaction completions are absent, relative to the primary. The amount
of data loss is proportional to the replication delay at the time of
failover.Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication.When synchronous replication is requested, the commit of a write
transaction will wait until confirmation that the commit record has been
transferred successfully to at least one standby server. Waiting for
confirmation increases the user's confidence that the changes will not
be lost in the event of server crashes but it also necessarily increases
the response time for the requesting transaction. The minimum wait time
is the roundtrip time from primary to standby.Read only transactions and transaction rollbacks need not wait for
replies from standby servers. Subtransaction commits do not wait for
responses from standby servers, only final top-level commits. Long
running actions such as data loading or index building do not wait until
the very final commit message.25.2.6.1. Basic Configuration
Synchronous replication must be enabled on both the primary and at least
one standby server. If synchronous replication is disabled on the
master, or enabled on the primary but not enabled on any slaves, the
primary will use asynchronous replication by default.We use a single parameter to enable synchronous replication, set in
postgresql.conf on both primary and standby servers:synchronous_replication = off (default) | on
On the primary, synchronous_replication can be set for particular users
or databases, or dynamically by applications programs.
This seems like a potential issue, where I start a server with this off, and
then I start turning it on for specific transactions; it isn't exactly clear
what happens, since there may or may not be a running synchronous rep slave
available. (I love the idea though)
If more than one standby server specifies synchronous_replication, then
whichever standby replies first will release waiting commits.
I don't want you to think I am setting an expectation, but I'm curious about
the possibility of requiring more than 1 server to reply?
Turning this setting off for a standby allows the administrator to
exclude certain standby servers from releasing waiting transactions.
This is useful if not all standby servers are designated as potential
future primary servers. On the standby, this parameter only takes effect
at server start.25.2.6.2. Planning for Performance
Synchronous replication usually requires carefully planned and placed
standby servers to ensure applications perform acceptably. Waiting
doesn't utilise system resources, but transaction locks continue to be
held until the transfer is confirmed. As a result, incautious use of
synchronous replication will reduce performance for database
applications because of increased response times and higher contention.PostgreSQL allows the application developer to specify the durability
level required via replication. This can be specified for the system
overall, though it can also be specified for specific users or
connections, or even individual transactions.For example, an application workload might consist of: 10% of changes
are important customer details, while 90% of changes are less important
data that the business can more easily survive if it is lost, such as
chat messages between users.With synchronous replication options specified at the application level
(on the master) we can offer sync rep for the most important changes,
without slowing down the bulk of the total workload. Application level
options are an important and practical tool for allowing the benefits of
synchronous replication for high performance applications. This feature
is unique to PostgreSQL.25.2.6.3. Planning for High Availability
The easiest and safest method of gaining High Availability using
synchronous replication is to configure at least two standby servers. To
understand why, we need to examine what can happen when you lose all
standby servers.Commits made when synchronous_replication is set will wait until at
least one standby responds. The response may never occur if the last, or
only, standby should crash or the network drops. What should we do in
that situation?Sitting and waiting will typically cause operational problems because it
is an effective outage of the primary server. Allowing the primary
server to continue processing in the absence of a standby puts those
latest data changes at risk. How we handle this situation is controlled
by allow_standalone_primary. The default setting is on, allowing
processing to continue, though there is no recommended setting. Choosing
the best setting for allow_standalone_primary is a difficult decision
and best left to those with combined business responsibility for both
data and applications. The difficulty of this choice is the reason why
we recommend that you reduce the possibility of this situation occurring
by using multiple standby servers.When the primary is started with allow_standalone_primary enabled, the
primary will not allow connections until a standby connects that also
has synchronous_replication enabled. This is a convenience to ensure
that we don't allow connections before write transactions will return
successfully.
I think you mean "disabled" at the start here, right?
When allow_standalone_primary is set, a user will stop waiting once the
replication_timeout has been reached for their specific session. Users
are not waiting for a specific standby to reply, they are waiting for a
reply from any standby, so the unavailability of any one standby is not
significant to a user. It is possible for user sessions to hit timeout
even though standbys are communicating normally. In that case, the
setting of replication_timeout is probably too low.
will a notice or warning be thrown in these cases? I'm thinking something
like the checkpoint timeout warning, but could be something else; it just
seems to me you need some way to know you're timing out.
The standby sends regular status messages to the primary. If no status
messages have been received for replication_timeout the primary server
will assume the connection is dead and terminate it. This happens
whatever the setting of allow_standalone_primary.
Does the standby attempt to reconnect in these scenarios?
If primary crashes while commits are waiting for acknowledgement, those
transactions will be marked fully committed if the primary database
recovers, no matter how allow_standalone_primary is set.
This seems backwards; if you are waiting for acknowledgement, wouldn't the
normal assumption be that the transactions *didnt* make it to any standby,
and should be rolled back ?
There is no way
to be certain that all standbys have received all outstanding WAL data
at time of the crash of the primary. Some transactions may not show as
committed on the standby, even though they show as committed on the
primary. The guarantee we offer is that the application will not receive
explicit acknowledgement of the successful commit of a transaction until
the WAL data is known to be safely received by the standby. Hence this
mechanism is technically "semi synchronous" rather than "fully
synchronous" replication. Note that replication still not be fully
synchronous even if we wait for all standby servers, though this would
reduce availability, as described previously.
I think we ought to have an example of the best configuration for "cannot
afford to lose any data" scenarios, where we would prefer an overall service
interruption over the chance of having the primary / secondary out of
synch.
If you need to re-create a standby server while transactions are
waiting, make sure that the commands to run pg_start_backup() and
pg_stop_backup() are run in a session with synchronous_replication =
off, otherwise those requests will wait forever for the standby to
appear.18.5.5. Synchronous Replication
These settings control the behavior of the built-in synchronous
replication feature. These parameters would be set on the primary server
that is to send replication data to one or more standby servers.synchronous_replication (boolean)
Specifies whether transaction commit will wait for WAL records
to be replicated before the command returns a "success"
indication to the client. The default setting is off. When on,
there will be a delay while the client waits for confirmation of
successful replication. That delay will increase depending upon
the physical distance and network activity between primary and
standby. The commit wait will last until the first reply from
any standby. Multiple standby servers allow increased
availability and possibly increase performance as well.
The parameter must be set on both primary and standby.
On the primary, this parameter can be changed at any time; the
behavior for any one transaction is determined by the setting in
effect when it commits. It is therefore possible, and useful, to
have some transactions replicate synchronously and others
asynchronously. For example, to make a single multistatement
transaction commit asynchronously when the default is
synchronous replication, issue SET LOCAL synchronous_replication
TO OFF within the transaction.
On the standby, the parameter value is taken only at server
start.
synchronous_replication_timeout (boolean)
If the client has synchronous_replication set, and
allow_standalone_primary is also set, then the commit will wait
for up to synchronous_replication_timeout milliseconds before it
returns a "success", or will wait forever if
synchronous_replication_timeout is set to -1.
If a standby server does not reply for
synchronous_replication_timeout the primary will terminate the
replication connection.
allow_standalone_primary (boolean)
If allow_standalone_primary is not set, then the server will not
allow connections until a standby connects that has
synchronous_replication enabled.
allow_standalone_primary also affects the behaviour when the
synchronous_replication_timeout is reached.
somewhat concerned that we seem to need to use double negatives to describe
whats going on here. it makes me think we ought to rename this to
require_synchronous_standby or similar.
25.5.2. Handling query conflicts
….Remedial possibilities exist if the number of standby-query
cancellations is found to be unacceptable. Typically the best option is
to enable hot_standby_feedback. This prevents VACUUM from removing
recently-dead rows and so cleanup conflicts do not occur. If you do
this, you should note that this will delay cleanup of dead rows on the
primary, which may result in undesirable table bloat. However, the
cleanup situation will be no worse than if the standby queries were
running directly on the primary server. You are still getting the
benefit of off-loading execution onto the standby and the query may
complete faster than it would have done on the primary server.
max_standby_archive_delay must be kept large in this case, because
delayed WAL files might already contain entries that conflict with the
desired standby queries.…
18.5.6. Standby Servers
These settings control the behavior of a standby server that is to
receive replication data.hot_standby (boolean)
Specifies whether or not you can connect and run queries during
recovery, as described in Section 25.5. The default value is
off. This parameter can only be set at server start. It only has
effect during archive recovery or in standby mode.
hot_standby_feedback (boolean)
Specifies whether or not a hot standby will send feedback to the
primary about queries currently executing on the standby. This
parameter can be used to eliminate query cancels caused by
cleanup records, though it can cause database bloat on the
primary for some workloads. The default value is off. This
parameter can only be set at server start. It only has effect if
hot_standby is enabled.
i was expecting this section to mention the synchronous_replication (bool)
somewhere, to control if the standby will participate synchronously or
asynch; granted it's the same config as listed in 18.5.5 right? Just that
the heading of that section specifically targets the primary.
HTH, looks pretty good at first glance.
Robert Treat
http://www.xzilla.net
Most of your doc uses the terms "primary" and "standby", but a few
instances of "master" and "slave" have slipped in. I think it's better
to stick to consistent terminology.
On Thu, Dec 30, 2010 at 21:04, Simon Riggs <simon@2ndquadrant.com> wrote:
With synchronous replication options specified at the application level
(on the master) we can offer sync rep for the most important changes,
without slowing down the bulk of the total workload. Application level
options are an important and practical tool for allowing the benefits of
synchronous replication for high performance applications. This feature
is unique to PostgreSQL.
I think a comment about the "head-of-line blocking" nature of
streaming repliaction is in order. If you execute massive writes in
async mode and then run a transaction in sync mode, its commit will be
delayed until all the async transactions before it have been applied
on the slave.
synchronous_replication_timeout (boolean)
Doesn't look like a boolean to me :)
Regards,
Marti
On Thu, Dec 30, 2010 at 3:07 PM, Robert Treat <rob@xzilla.net> wrote:
If primary crashes while commits are waiting for acknowledgement, those
transactions will be marked fully committed if the primary database
recovers, no matter how allow_standalone_primary is set.This seems backwards; if you are waiting for acknowledgement, wouldn't the
normal assumption be that the transactions *didnt* make it to any standby,
and should be rolled back ?
This is the standard 2-phase commit problem. The primary server *has*
committed it, it's fsync has returned, and the only thing keeping it
from returning the commit to the client is that it's waiting on a
synchronous "ack" from a slave.
You've got 2 options:
1) initiate fsync on the slave first
- In this case, the slave is farther ahead than the primary, and if
primary fails, you're *forced* to have a failover. The standby is
head of the primary, so the primary recovering can cause divergence.
And you'll likely have to do a base-backup style sync to get a new
primary/standby setup.
2) initiate fsync on the primary first
- In this case, the slave is always slightly behind. If if your
primary falls over, you don't give commit messages to the clients, but
if it recovers, it might have committed data, and slaves will still be
able to catch up.
The thing is that currently, even without replication, #2 can happen.
If your db falls over before it gets the commit packet stuffed out the
network, you're in the same boat. The data might be committed, even
though you didn't get the commit packet, and when your DB recovers,
it's got the committed data that you never "knew" was committed.
a.
--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.
On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
If more than one standby server specifies synchronous_replication,
then
whichever standby replies first will release waiting commits.
I don't want you to think I am setting an expectation, but I'm curious
about the possibility of requiring more than 1 server to reply?
I was initially interested in this myself, but after a long discussion
on "quorum commit" it was decided to go with "first past post".
That is easier to manage, requires one less parameter, performs better
and doesn't really add that much additional confidence.
It was also discussed that we would have a plugin API, but I'm less sure
about that now. Perhaps we can add that option in the future, but its
not high on my list of things for this release.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
We use a single parameter to enable synchronous replication, set in
postgresql.conf on both primary and standby servers:synchronous_replication = off (default) | on
On the primary, synchronous_replication can be set for particular
users
or databases, or dynamically by applications programs.
This seems like a potential issue, where I start a server with this
off, and then I start turning it on for specific transactions; it
isn't exactly clear what happens, since there may or may not be a
running synchronous rep slave available. (I love the idea though)
Not really an issue. Even if there was a standby there a moment ago, the
standby can go away at any time. So we must cope gracefully with what
happens if you do this. By default, the parameters specify that in the
case you mention we will just use async replication (no wait!).
Options exist to change that, since some people want to wait until the
sysadmin adds a standby.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On 12/30/2010 08:04 PM, Simon Riggs wrote:
On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote:
it would help if this would just be a simple text-only description of
the design that people can actually comment on inline. I don't think
sending technical design proposals as a pdf (which seems to be written
in doc-style as well) is a good idea to encourage discussion on -hackers :(25.2.6. Synchronous Replication
Streaming replication is by default asynchronous. Transactions on the
primary server write commit records to WAL, yet do not know whether or
when a standby has received and processed those changes. So with
asynchronous replication, if the primary crashes, transactions committed
on the primary might not have been received by any standby. As a result,
failover from primary to standby could cause data loss because
transaction completions are absent, relative to the primary. The amount
of data loss is proportional to the replication delay at the time of
failover.Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication.When synchronous replication is requested, the commit of a write
transaction will wait until confirmation that the commit record has been
transferred successfully to at least one standby server. Waiting for
confirmation increases the user's confidence that the changes will not
be lost in the event of server crashes but it also necessarily increases
the response time for the requesting transaction. The minimum wait time
is the roundtrip time from primary to standby.
hmm this is one of the main problems I see with the proposed "master is
sometimes aware of the standby"(as in the feedback mode) concept this
proposal has. If it waits for only one of the standbys there is some
issue with the terminology. As a DBA I would expect the master to only
return if ALL of the "sync replication" declared nodes replied ok.
Read only transactions and transaction rollbacks need not wait for
replies from standby servers. Subtransaction commits do not wait for
responses from standby servers, only final top-level commits. Long
running actions such as data loading or index building do not wait until
the very final commit message.25.2.6.1. Basic Configuration
Synchronous replication must be enabled on both the primary and at least
one standby server. If synchronous replication is disabled on the
master, or enabled on the primary but not enabled on any slaves, the
primary will use asynchronous replication by default.We use a single parameter to enable synchronous replication, set in
postgresql.conf on both primary and standby servers:
this reads as if you can only set it there
synchronous_replication = off (default) | on
On the primary, synchronous_replication can be set for particular users
or databases, or dynamically by applications programs.
this says otherwise
If more than one standby server specifies synchronous_replication, then
whichever standby replies first will release waiting commits.
see above for why I think this violates the configuration promise - if I
say "this is a sync standby" I better expect it to be...
Turning this setting off for a standby allows the administrator to
exclude certain standby servers from releasing waiting transactions.
This is useful if not all standby servers are designated as potential
future primary servers. On the standby, this parameter only takes effect
at server start.25.2.6.2. Planning for Performance
Synchronous replication usually requires carefully planned and placed
standby servers to ensure applications perform acceptably. Waiting
doesn't utilise system resources, but transaction locks continue to be
held until the transfer is confirmed. As a result, incautious use of
synchronous replication will reduce performance for database
applications because of increased response times and higher contention.PostgreSQL allows the application developer to specify the durability
level required via replication. This can be specified for the system
overall, though it can also be specified for specific users or
connections, or even individual transactions.For example, an application workload might consist of: 10% of changes
are important customer details, while 90% of changes are less important
data that the business can more easily survive if it is lost, such as
chat messages between users.With synchronous replication options specified at the application level
(on the master) we can offer sync rep for the most important changes,
without slowing down the bulk of the total workload. Application level
options are an important and practical tool for allowing the benefits of
synchronous replication for high performance applications. This feature
is unique to PostgreSQL.
that seems to be a bit too much marketing for a reference level document
25.2.6.3. Planning for High Availability
The easiest and safest method of gaining High Availability using
synchronous replication is to configure at least two standby servers. To
understand why, we need to examine what can happen when you lose all
standby servers.Commits made when synchronous_replication is set will wait until at
least one standby responds. The response may never occur if the last, or
only, standby should crash or the network drops. What should we do in
that situation?Sitting and waiting will typically cause operational problems because it
is an effective outage of the primary server. Allowing the primary
server to continue processing in the absence of a standby puts those
latest data changes at risk. How we handle this situation is controlled
by allow_standalone_primary. The default setting is on, allowing
processing to continue, though there is no recommended setting. Choosing
the best setting for allow_standalone_primary is a difficult decision
and best left to those with combined business responsibility for both
data and applications. The difficulty of this choice is the reason why
we recommend that you reduce the possibility of this situation occurring
by using multiple standby servers.
if there is no recommended setting what will be the default?
[...]
25.5.2. Handling query conflicts
….Remedial possibilities exist if the number of standby-query
cancellations is found to be unacceptable. Typically the best option is
to enable hot_standby_feedback. This prevents VACUUM from removing
recently-dead rows and so cleanup conflicts do not occur. If you do
this, you should note that this will delay cleanup of dead rows on the
primary, which may result in undesirable table bloat. However, the
cleanup situation will be no worse than if the standby queries were
running directly on the primary server. You are still getting the
benefit of off-loading execution onto the standby and the query may
complete faster than it would have done on the primary server.
max_standby_archive_delay must be kept large in this case, because
delayed WAL files might already contain entries that conflict with the
desired standby queries.…
18.5.6. Standby Servers
These settings control the behavior of a standby server that is to
receive replication data.hot_standby (boolean)
Specifies whether or not you can connect and run queries during
recovery, as described in Section 25.5. The default value is
off. This parameter can only be set at server start. It only has
effect during archive recovery or in standby mode.
hot_standby_feedback (boolean)
Specifies whether or not a hot standby will send feedback to the
primary about queries currently executing on the standby. This
parameter can be used to eliminate query cancels caused by
cleanup records, though it can cause database bloat on the
primary for some workloads. The default value is off. This
parameter can only be set at server start. It only has effect if
hot_standby is enabled.
so if this is enabled - suddenly the master becomes (kinda) aware of the
specifics of a given standby - but what happens when one of the standby
is offline for a while how does the master know that?
What I'm really missing with that proposal is how people expect that
solution to be managed - given there is only sometimes a feedback
channel into the master you can't do the monitoring.
Even if you could (which we really need!) there is nothing in the
proposal yet that will help to determine on what the most recent standby
(in the case of more >1 sync standby) might be.
It also does not address the more general (not sync rep specific)
problem of how to deal with max_keep_segments which is a wart and I was
hoping we could get rid of in 9.1 - but it would require a real standby
registration or at least standby management possibility on the master
not a halfway done one - so do we really need hot_standby_feedback as
part of the inital sync-rep patch?
Stefan
On Thu, Dec 30, 2010 at 3:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
If more than one standby server specifies synchronous_replication,
then
whichever standby replies first will release waiting commits.
I don't want you to think I am setting an expectation, but I'm curious
about the possibility of requiring more than 1 server to reply?I was initially interested in this myself, but after a long discussion
on "quorum commit" it was decided to go with "first past post".That is easier to manage, requires one less parameter, performs better
and doesn't really add that much additional confidence.
Yes, I think with a single master, you are probably right (been
dealing with more than my fair share of multi-master based nosql
solutions lately)
Still, one thing that has me concerned is that in the case of two
slaves, you don't know which one is the more up-to-date one if you
need to failover. It'd be nice if you could just guarantee they both
are, but in lieu of that, I guess whatever decision tree is being
used, it needs to look at current xlog location of any potential
failover targets.
It was also discussed that we would have a plugin API, but I'm less sure
about that now. Perhaps we can add that option in the future, but its
not high on my list of things for this release.
Agreed.
Robert Treat
http://www.xzilla.net
On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
When allow_standalone_primary is set, a user will stop waiting once
the
replication_timeout has been reached for their specific session.
Users
are not waiting for a specific standby to reply, they are waiting
for a
reply from any standby, so the unavailability of any one standby is
not
significant to a user. It is possible for user sessions to hit
timeout
even though standbys are communicating normally. In that case, the
setting of replication_timeout is probably too low.will a notice or warning be thrown in these cases? I'm thinking
something
like the checkpoint timeout warning, but could be something else; it
just
seems to me you need some way to know you're timing out.
We can do that, yes.
The standby sends regular status messages to the primary. If no
status
messages have been received for replication_timeout the primary
server
will assume the connection is dead and terminate it. This happens
whatever the setting of allow_standalone_primary.Does the standby attempt to reconnect in these scenarios?
Yes it would, but the reason why we terminated the connection was it
wasn't talking any more, so it is probably dead.
If primary crashes while commits are waiting for acknowledgement,
those
transactions will be marked fully committed if the primary database
recovers, no matter how allow_standalone_primary is set.This seems backwards; if you are waiting for acknowledgement, wouldn't
the
normal assumption be that the transactions *didnt* make it to any
standby,
and should be rolled back ?
Well, we can't roll it back. We have already written the commit record
to WAL.
There is no way
to be certain that all standbys have received all outstanding WALdata
at time of the crash of the primary. Some transactions may not show
as
committed on the standby, even though they show as committed on the
primary. The guarantee we offer is that the application will notreceive
explicit acknowledgement of the successful commit of a transaction
until
the WAL data is known to be safely received by the standby. Hence
this
mechanism is technically "semi synchronous" rather than "fully
synchronous" replication. Note that replication still not be fully
synchronous even if we wait for all standby servers, though thiswould
reduce availability, as described previously.
I think we ought to have an example of the best configuration for
"cannot
afford to lose any data" scenarios, where we would prefer an overall
service
interruption over the chance of having the primary / secondary out of
synch.
I say "use two or more standbys" more than once...
somewhat concerned that we seem to need to use double negatives to
describe
whats going on here. it makes me think we ought to rename this to
require_synchronous_standby or similar.
Don't see why we can't use double negatives. ;-)
The parameter is named directly from Fujii Masao's suggestion.
18.5.6. Standby Servers
These settings control the behavior of a standby server that is to
receive replication data.
...
i was expecting this section to mention the synchronous_replication
(bool)
somewhere, to control if the standby will participate synchronously or
asynch; granted it's the same config as listed in 18.5.5 right? Just
that
the heading of that section specifically targets the primary.
OK, good idea.
HTH, looks pretty good at first glance.
Thanks.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote:
Still, one thing that has me concerned is that in the case of two
slaves, you don't know which one is the more up-to-date one if you
need to failover. It'd be nice if you could just guarantee they both
are...
Regrettably, nobody can know that, without checking.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On 12/30/2010 10:01 PM, Simon Riggs wrote:
On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote:
Still, one thing that has me concerned is that in the case of two
slaves, you don't know which one is the more up-to-date one if you
need to failover. It'd be nice if you could just guarantee they both
are...Regrettably, nobody can know that, without checking.
how exactly would you check? - this seems like something that needs to
be done from the SQL and the CLI level and also very well documented
(which I cannot see in your proposal).
Stefan
On Thu, 2010-12-30 at 22:11 +0200, Marti Raudsepp wrote:
I think a comment about the "head-of-line blocking" nature of
streaming repliaction is in order. If you execute massive writes in
async mode and then run a transaction in sync mode, its commit will be
delayed until all the async transactions before it have been applied
on the slave.
Not really sure I understand what you want me to add there. The case you
mention is identical whether we use the word "async" or "sync" where you
mention "in async mode".
Replication doesn't wait until a sync commit is requested, it is
continuously active.
Sync rep's only addition are the reply messages.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Thu, 2010-12-30 at 21:42 +0100, Stefan Kaltenbrunner wrote:
Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication.When synchronous replication is requested, the commit of a write
transaction will wait until confirmation that the commit record has been
transferred successfully to at least one standby server. Waiting for
confirmation increases the user's confidence that the changes will not
be lost in the event of server crashes but it also necessarily increases
the response time for the requesting transaction. The minimum wait time
is the roundtrip time from primary to standby.hmm this is one of the main problems I see with the proposed "master is
sometimes aware of the standby"(as in the feedback mode) concept this
proposal has. If it waits for only one of the standbys there is some
issue with the terminology. As a DBA I would expect the master to only
return if ALL of the "sync replication" declared nodes replied ok.
Well, as a DBA, I expect it to work with just one. That's how MySQL and
Oracle work at least. If ALL standbys reply, it takes longer, makes the
code harder, how do you determine what "all" is robustly etc.. Plus its
been discussed already.
What I'm really missing with that proposal is how people expect that
solution to be managed -
What aspect do you wish to monitor? I'm happy to consider your
suggestions.
given there is only sometimes a feedback
channel into the master you can't do the monitoring.
Not sure what you mean. Please explain more.
Even if you could (which we really need!) there is nothing in the
proposal yet that will help to determine on what the most recent standby
(in the case of more >1 sync standby) might be.
Functions to determine that already exist.
- but it would require a real standby
registration or at least standby management possibility on the master
not a halfway done one - so do we really need hot_standby_feedback as
part of the inital sync-rep patch?
It is a Hot Standby feature, but so tightly integrated with this code
that it isn't possible for me to submit as two separate patches.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
We use a single parameter to enable synchronous replication, set in
postgresql.conf on both primary and standby servers:synchronous_replication = off (default) | on
On the primary, synchronous_replication can be set for particular users
or databases, or dynamically by applications programs.If more than one standby server specifies synchronous_replication, then
whichever standby replies first will release waiting commits.Turning this setting off for a standby allows the administrator to
exclude certain standby servers from releasing waiting transactions.
This is useful if not all standby servers are designated as potential
future primary servers. On the standby, this parameter only takes effect
at server start.
I think it's a bad idea to use the same parameter to mean different
things on the master and standby. You proposed this kind of double
meaning for the hot_standby parameter (possibly back when it was
called standby_connections, or something like that) and we (rightly, I
think) did not adopt that, instead ending up with wal_level to control
the master's behavior and hot_standby to control the slave's behavior.
synchronous_replication (boolean)
Specifies whether transaction commit will wait for WAL records
to be replicated before the command returns a "success"
indication to the client.
The word "replicated" here could be taken to mean different things,
most obviously:
- slave has received the WAL
- slave has fsync'd the WAL
- slave has applied the WAL
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, 2010-12-30 at 22:08 +0100, Stefan Kaltenbrunner wrote:
On 12/30/2010 10:01 PM, Simon Riggs wrote:
On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote:
Still, one thing that has me concerned is that in the case of two
slaves, you don't know which one is the more up-to-date one if you
need to failover. It'd be nice if you could just guarantee they both
are...Regrettably, nobody can know that, without checking.
how exactly would you check? - this seems like something that needs to
be done from the SQL and the CLI level and also very well documented
(which I cannot see in your proposal).
This is a proposal for sync rep, not multi-node failover. I'm definitely
not going to widen the scope of this project.
Functions already exist to check the thing you're asking.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Thu, Dec 30, 2010 at 3:42 PM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
synchronous replication for high performance applications. This feature
is unique to PostgreSQL.that seems to be a bit too much marketing for a reference level document
+1.
It also does not address the more general (not sync rep specific) problem of
how to deal with max_keep_segments which is a wart and I was hoping we could
get rid of in 9.1 - but it would require a real standby registration or at
least standby management possibility on the master not a halfway done one -
so do we really need hot_standby_feedback as part of the inital sync-rep
patch?
And this is really the key point on which previous discussions of sync
rep stalled. Simon is clearly of the opinion that any system where
the slaves have an individual identities (aka "standby registration")
is a bad idea, but the only justification he's offered for that
position is the assertion that it doesn't allow any added
functionality. As you point out, and as has been pointed out before,
this is not true, but unless Simon has changed his position since the
last time we discussed this, he will not only refuse to include any
kind of standby identifier in any of his proposals, but will also
argue against including any such code even if it is written by someone
else. I don't understand why, but that's how it is.
Synchronous replication would probably be done and committed by now if
it weren't for this issue.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Dec 30, 2010, at 3:27 PM, Robert Haas wrote:
synchronous_replication (boolean)
Specifies whether transaction commit will wait for WAL records
to be replicated before the command returns a "success"
indication to the client.The word "replicated" here could be taken to mean different things,
most obviously:- slave has received the WAL
- slave has fsync'd the WAL
- slave has applied the WAL
I think that comment is valid for the entire set of docs, actually. The document goes out of its way to avoid simple phrases like "replicated", but doesn't spell out exactly what is happening, ie:
"Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication."
Reading that, I'm left with the sense that this isn't a simple matter of "Oh, the data has been replicated to the slave before commit returns", but nothing does a good job of clearly explaining what the distinction is and what it means. This section:
"The guarantee we offer is that the application will not receive
explicit acknowledgement of the successful commit of a transaction until
the WAL data is known to be safely received by the standby. Hence this
mechanism is technically "semi synchronous" rather than "fully
synchronous" replication."
does provide some enlightenment, but it's at the end of the section. I think it would be best if there was a section right at the beginning that talked about the data quality issue of sync replication and how we're avoiding it with our semi-sync solution.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
On Thu, 2010-12-30 at 16:27 -0500, Robert Haas wrote:
I think it's a bad idea to use the same parameter to mean different
things on the master and standby.
Obviously if you phrase it like that, nobody would disagree. I would say
I have used the same parameter on both sides in a balanced way to
simplify the configuration, which had been an important factor in the
debate.
"You need to set parameter X on both primary and standby" seems simple
and clear. It certainly works OK for MySQL.
It's no bother to change, whichever way we decide and I'm happy to do
so.
My previous patch had two parameters:
primary: synchronous_replication = ...
standby: synchronous_replication_service = on | off
Which do people prefer?
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services