pgsql: Efficient transaction-controlled synchronous replication.
Efficient transaction-controlled synchronous replication.
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.
This version has very strict behaviour; more relaxed options
may be added at a later date.
Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
of many other design reviewers.
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/a8a8a3e0965201df88bdfdff08f50e5c06c552b7
Modified Files
--------------
doc/src/sgml/config.sgml | 86 +++++++++++
doc/src/sgml/high-availability.sgml | 203 +++++++++++++++++++++++++
doc/src/sgml/monitoring.sgml | 7 +-
src/backend/access/transam/twophase.c | 25 +++
src/backend/access/transam/xact.c | 11 ++-
src/backend/catalog/system_views.sql | 4 +-
src/backend/postmaster/autovacuum.c | 7 +
src/backend/postmaster/postmaster.c | 3 +
src/backend/replication/Makefile | 2 +-
src/backend/replication/walreceiver.c | 9 +-
src/backend/replication/walsender.c | 65 +++++++-
src/backend/storage/ipc/shmqueue.c | 21 +++-
src/backend/storage/lmgr/proc.c | 12 ++
src/backend/utils/misc/guc.c | 19 +++
src/backend/utils/misc/postgresql.conf.sample | 11 ++-
src/include/catalog/pg_proc.h | 2 +-
src/include/replication/walsender.h | 22 +++
src/include/storage/lwlock.h | 1 +
src/include/storage/proc.h | 14 ++
src/include/storage/shmem.h | 3 +
src/test/regress/expected/rules.out | 2 +-
21 files changed, 507 insertions(+), 22 deletions(-)
On 03/06/2011 05:51 PM, Simon Riggs wrote:
Efficient transaction-controlled synchronous replication.
I'm glad this is in, but I thought we agreed NOT to call it "synchronous
replication".
cheers
andrew
Simon Riggs <simon@2ndQuadrant.com> writes:
Efficient transaction-controlled synchronous replication.
This patch broke the build. Kindly fix or revert at once.
regards, tom lane
On Sun, 2011-03-06 at 18:09 -0500, Andrew Dunstan wrote:
On 03/06/2011 05:51 PM, Simon Riggs wrote:
Efficient transaction-controlled synchronous replication.
I'm glad this is in, but I thought we agreed NOT to call it "synchronous
replication".
The discussion on the thread was that its not sync rep unless we have
the strictest guarantees. We have the strictest guarantees, so it
qualifies as sync rep.
Relaxations are possible and, to some people, desirable.
Perhaps there is a more marketable term, and if so, we can rebrand. It
wouldn't be the first time things got renamed in beta.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Sun, Mar 6, 2011 at 6:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
Efficient transaction-controlled synchronous replication.
This patch broke the build. Kindly fix or revert at once.
Seems Simon forgot to include src/include/replication/syncrep.h on the commit
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Sun, Mar 6, 2011 at 6:36 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
On Sun, Mar 6, 2011 at 6:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
Efficient transaction-controlled synchronous replication.
This patch broke the build. Kindly fix or revert at once.
Seems Simon forgot to include src/include/replication/syncrep.h on the commit
It doesn't have src/backend/replication/syncrep.c either
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Sun, 2011-03-06 at 18:28 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
Efficient transaction-controlled synchronous replication.
This patch broke the build. Kindly fix or revert at once.
I think that's fixed it now. I was in the middle of doing that when your
last commit hit, so I had to rewind and try again.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On 07.03.2011 01:28, Simon Riggs wrote:
On Sun, 2011-03-06 at 18:09 -0500, Andrew Dunstan wrote:
On 03/06/2011 05:51 PM, Simon Riggs wrote:
Efficient transaction-controlled synchronous replication.
I'm glad this is in, but I thought we agreed NOT to call it "synchronous
replication".The discussion on the thread was that its not sync rep unless we have
the strictest guarantees. We have the strictest guarantees, so it
qualifies as sync rep.
What do you mean by "strictes guarantees"?
I don't see allow_synchronous_standby setting in the committed patch. I
presume you didn't make allow_synchronous_standby=off the default
behavior. Also, the documentation that describes this as two-safe
replication and claims that "the only possibility that data can be lost
is if both the primary and the standby suffer crashes at the same time"
needs big fat caveats to clarify that this doesn't actually achieve
those guarantees.
Please change the name.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, 2011-03-07 at 09:29 +0200, Heikki Linnakangas wrote:
I presume you didn't make allow_synchronous_standby=off the default
behavior.
You presume incorrectly.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On 07.03.2011 09:48, Simon Riggs wrote:
On Mon, 2011-03-07 at 09:29 +0200, Heikki Linnakangas wrote:
I presume you didn't make allow_synchronous_standby=off the default
behavior.
Sorry, s/allow_synchronous_standby/allow_standalone_master
You presume incorrectly.
Ok, ok then. Thank you! Looks like I need to git pull and get myself
up-to-speed with these latest developments :-).
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Mar 7, 2011 at 7:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Efficient transaction-controlled synchronous replication.
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.This version has very strict behaviour; more relaxed options
may be added at a later date.
Pretty cool! I'd appreciate very much your efforts and contributions.
And,, I found one bug ;) You seem to have wrongly removed the check
of max_wal_senders in SyncRepWaitForLSN. This can make the
backend wait for replication even if max_wal_senders = 0. I could produce
this problematic situation in my machine. The attached patch fixes this problem.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
syncrep_check_max_wal_senders_v1.patchapplication/octet-stream; name=syncrep_check_max_wal_senders_v1.patchDownload+13-12
On Mon, Mar 7, 2011 at 5:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Mar 7, 2011 at 7:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Efficient transaction-controlled synchronous replication.
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.This version has very strict behaviour; more relaxed options
may be added at a later date.Pretty cool! I'd appreciate very much your efforts and contributions.
And,, I found one bug ;) You seem to have wrongly removed the check
of max_wal_senders in SyncRepWaitForLSN. This can make the
backend wait for replication even if max_wal_senders = 0. I could produce
this problematic situation in my machine. The attached patch fixes this problem.
if (strlen(SyncRepStandbyNames) > 0 && max_wal_senders == 0)
ereport(ERROR,
(errmsg("Synchronous replication requires WAL streaming
(max_wal_senders > 0)")));
The above check should be required also after pg_ctl reload since
synchronous_standby_names can be changed by SIGHUP?
Or how about just removing that? If the patch I submitted is
committed,empty synchronous_standby_names and max_wal_senders = 0
settings is no longer unsafe.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Mon, 2011-03-07 at 17:27 +0900, Fujii Masao wrote:
On Mon, Mar 7, 2011 at 7:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
And,, I found one bug ;) You seem to have wrongly removed the check
of max_wal_senders in SyncRepWaitForLSN. This can make the
backend wait for replication even if max_wal_senders = 0. I could produce
this problematic situation in my machine. The attached patch fixes this problem.
There may be a bug, but that's not the fix.
I spotted that issue myself in testing. I put in a protection to stop
setting synchronous_standby_names if max_wal_senders is zero, with error
message.
Are you saying the committed version doesn't trigger that ERROR?
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Mar 7, 2011 at 6:20 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Mon, 2011-03-07 at 17:27 +0900, Fujii Masao wrote:
On Mon, Mar 7, 2011 at 7:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
And,, I found one bug ;) You seem to have wrongly removed the check
of max_wal_senders in SyncRepWaitForLSN. This can make the
backend wait for replication even if max_wal_senders = 0. I could produce
this problematic situation in my machine. The attached patch fixes this problem.There may be a bug, but that's not the fix.
I spotted that issue myself in testing. I put in a protection to stop
setting synchronous_standby_names if max_wal_senders is zero, with error
message.Are you saying the committed version doesn't trigger that ERROR?
I changed synchronous_standby_names after startup and reloaded the
configuration file. So I didn't encounter such an error message.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Mon, 2011-03-07 at 17:44 +0900, Fujii Masao wrote:
The above check should be required also after pg_ctl reload since
synchronous_standby_names can be changed by SIGHUP?
Or how about just removing that? If the patch I submitted is
committed,empty synchronous_standby_names and max_wal_senders = 0
settings is no longer unsafe.
Ah, on reload. I plugged the gap only at startup.
I'll fix by changing assign_synchronous_standby_names(), not by changing
lots of other parts of code and making runtime check each COMMIT.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Mar 7, 2011 at 6:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Mon, 2011-03-07 at 17:44 +0900, Fujii Masao wrote:
The above check should be required also after pg_ctl reload since
synchronous_standby_names can be changed by SIGHUP?
Or how about just removing that? If the patch I submitted is
committed,empty synchronous_standby_names and max_wal_senders = 0
settings is no longer unsafe.Ah, on reload. I plugged the gap only at startup.
I'll fix by changing assign_synchronous_standby_names(), not by changing
lots of other parts of code and making runtime check each COMMIT.
I don't think that the check of local variable for each COMMIT wastes the
cycle so much. Anyway, the reload of the configuration file should not
cause the server to end unexpectedly. IOW, GUC assign hook should
use GUC_complaint_elevel instead of FATAL, in ereport. The attached
patch fixes that, and includes two typo fixes.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
use_guc_complaint_elevel_v1.patchapplication/octet-stream; name=use_guc_complaint_elevel_v1.patchDownload+6-6
On Mon, Mar 7, 2011 at 5:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Mar 7, 2011 at 7:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Efficient transaction-controlled synchronous replication.
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.This version has very strict behaviour; more relaxed options
may be added at a later date.Pretty cool! I'd appreciate very much your efforts and contributions.
Here are another comments:
if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0 ||
SyncRepRequested())
Whenever synchronous_replication is TRUE, we disable synchronous_commit.
But, before disabling that, we should check also max_wal_senders and
synchronous_standby_names? Otherwise, synchronous_commit can
be disabled unexpectedly even in non replication case.
- /* Let the master know that we received some data. */
- XLogWalRcvSendReply();
- XLogWalRcvSendHSFeedback();
This change completely eliminates the difference between write_location
and flush_location in pg_stat_replication. If this change is reasoable, we
should get rid of write_location from pg_stat_replication since it's useless.
If not, this change should be reverted. I'm not sure whether monitoring
the difference between write and flush locations is useful. But I guess that
someone thought so and that code was added.
+ /*
+ * Current location of the head of the queue. All waiters should have
+ * a waitLSN that follows this value, or they are currently being woken
+ * to remove themselves from the queue. Protected by SyncRepLock.
+ */
+ XLogRecPtr lsn;
The comment ", or they are currently being woken to remove themselves
from the queue" is no longer required because the proc is currently removed
by walsender.
I found some typos. The attached patch fixes them.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
sync_rep_typo_fix_v1.patchapplication/octet-stream; name=sync_rep_typo_fix_v1.patchDownload+6-7
On 03/07/2011 02:29 AM, Heikki Linnakangas wrote:
On 07.03.2011 01:28, Simon Riggs wrote:
On Sun, 2011-03-06 at 18:09 -0500, Andrew Dunstan wrote:
On 03/06/2011 05:51 PM, Simon Riggs wrote:
Efficient transaction-controlled synchronous replication.
I'm glad this is in, but I thought we agreed NOT to call it
"synchronous
replication".The discussion on the thread was that its not sync rep unless we have
the strictest guarantees. We have the strictest guarantees, so it
qualifies as sync rep.What do you mean by "strictes guarantees"?
I don't see allow_synchronous_standby setting in the committed patch.
I presume you didn't make allow_synchronous_standby=off the default
behavior. Also, the documentation that describes this as two-safe
replication and claims that "the only possibility that data can be
lost is if both the primary and the standby suffer crashes at the same
time" needs big fat caveats to clarify that this doesn't actually
achieve those guarantees.Please change the name.
Previously, Simon said:
Truly "synchronous" requires two-phase commit, which this never was.
So I too am confused about how it's now become "truly synchronous". Are
we saying this give the same or better guarantees than a 2PC setup?
cheers
andrew
On 07.03.2011 15:30, Andrew Dunstan wrote:
Previously, Simon said:
Truly "synchronous" requires two-phase commit, which this never was.
So I too am confused about how it's now become "truly synchronous". Are
we saying this give the same or better guarantees than a 2PC setup?
The guarantee we have now with synchronous_replication=on is that when
the server acknowledges a commit to the client (ie. when COMMIT command
returns), the transaction is safely flushed to disk on the master and at
least one synchronous standby server.
What you don't get is a guarantee on what happens to transactions that
were not acknowledged to the client. For example, if you pull the power
plug, the transaction that was just being committed might be committed
on the master, but not yet on the standby.
For me, that's enough to call it "synchronous replication". It provides
a useful guarantee to the client. But you could argue for an even
stricter definition, requiring atomicity so that if a transaction is not
successfully replicated for any reason, including crash, it is rolled
back in the master too. That would require 2PC.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On 03/07/2011 09:02 AM, Heikki Linnakangas wrote:
On 07.03.2011 15:30, Andrew Dunstan wrote:
Previously, Simon said:
Truly "synchronous" requires two-phase commit, which this never was.
So I too am confused about how it's now become "truly synchronous". Are
we saying this give the same or better guarantees than a 2PC setup?The guarantee we have now with synchronous_replication=on is that when
the server acknowledges a commit to the client (ie. when COMMIT
command returns), the transaction is safely flushed to disk on the
master and at least one synchronous standby server.What you don't get is a guarantee on what happens to transactions that
were not acknowledged to the client. For example, if you pull the
power plug, the transaction that was just being committed might be
committed on the master, but not yet on the standby.For me, that's enough to call it "synchronous replication". It
provides a useful guarantee to the client. But you could argue for an
even stricter definition, requiring atomicity so that if a transaction
is not successfully replicated for any reason, including crash, it is
rolled back in the master too. That would require 2PC.
My worry is that the stricter definition is what many people will
expect, without reading the fine print.
cheers
andrew