Standalone synchronous master

Started by Alexander Björnhagenabout 14 years ago26 messages
#1Alexander Björnhagen
alex.bjornhagen@gmail.com
1 attachment(s)

Hi all,

I’m new here so maybe someone else already has this in the works ?

Anyway, proposed change/patch :

Add a new parameter :

synchronous_standalone_master = on | off

To control whether a master configured with synchronous_commit = on is
allowed to stop waiting for standby WAL sync when all synchronous
standby WAL senders are disconnected.

Current behavior is that the master waits indefinitely until a
synchronous standby becomes available or until synchronous_commit is
disabled manually. This would still be the default, so
synchronous_standalone_master defaults to off.

Previously discussed here :

http://archives.postgresql.org/pgsql-hackers/2010-10/msg01009.php

I’m attaching a working patch against master/HEAD and I hope the
spirit of christmas will make you see kindly on my attempt :) or
something ...

It works fine and I added some extra logging so that it would be
possible to follow more easily from an admins point of view.

It looks like this when starting the primary server with
synchronous_standalone_master = on :

$ ./postgres
LOG: database system was shut down at 2011-12-25 20:27:13 CET
<-- No standby is connected at startup
LOG: not waiting for standby synchronization
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
<-- First sync standby connects here so switch to sync mode
LOG: standby "tx0113" is now the synchronous standby with priority 1
LOG: waiting for standby synchronization
<-- standby wal receiver on the standby is killed (SIGKILL)
LOG: unexpected EOF on standby connection
LOG: not waiting for standby synchronization
<-- restart standby so that it connects again
LOG: standby "tx0113" is now the synchronous standby with priority 1
LOG: waiting for standby synchronization
<-- standby wal receiver is first stopped (SIGSTOP) to make sure
we have outstanding waits in the primary, then killed (SIGKILL)
LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on standby connection
LOG: not waiting for standby synchronization
<-- client now finally receives commit ACK that was hanging due
to the SIGSTOP:ed wal receiver on the standby node

And so on ... any comments are welcome :)

Thanks and cheers,

/A

Attachments:

sync-standalone-v1.patch.txttext/plain; charset=US-ASCII; name=sync-standalone-v1.patch.txtDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0cc3296..6367dcc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2182,6 +2182,24 @@ SET ENABLE_SEQSCAN TO OFF;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-synchronous-standalone-master" xreflabel="synchronous-standalone-master">
+      <term><varname>synchronous_standalone_master</varname> (<type>boolean</type>)</term>
+      <indexterm>
+       <primary><varname>synchronous_standalone_master</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+       Specifies how the master behaves when <xref linkend="guc-synchronous-commit">
+	is set to <literal>on</> and <xref linkend="guc-synchronous-standby-names"> is configured but no
+	 appropriate standby servers are currently connected. If enabled, the master will
+	 continue processing transactions alone. If disabled, all the transactions on the
+	 master are blocked until a synchronous standby has appeared.
+
+	The default is disabled.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
 
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e9ae1e8..706af88 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -353,6 +353,8 @@ CheckpointerMain(void)
 
 	/* Do this once before starting the loop, then just at SIGHUP time. */
 	SyncRepUpdateSyncStandbysDefined();
+	SyncRepUpdateSyncStandaloneAllowed();
+	SyncRepCheckIfStandaloneMaster();
 
 	/*
 	 * Loop forever
@@ -382,6 +384,7 @@ CheckpointerMain(void)
 			ProcessConfigFile(PGC_SIGHUP);
 			/* update global shmem state for sync rep */
 			SyncRepUpdateSyncStandbysDefined();
+			SyncRepUpdateSyncStandaloneAllowed();
 		}
 		if (checkpoint_requested)
 		{
@@ -658,6 +661,7 @@ CheckpointWriteDelay(int flags, double progress)
 			ProcessConfigFile(PGC_SIGHUP);
 			/* update global shmem state for sync rep */
 			SyncRepUpdateSyncStandbysDefined();
+			SyncRepUpdateSyncStandaloneAllowed();
 		}
 
 		AbsorbFsyncRequests();
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 95de6c7..fd3e782 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -59,6 +59,8 @@
 /* User-settable parameters for sync rep */
 char	   *SyncRepStandbyNames;
 
+bool       SyncRepStandaloneMasterAllowed;
+
 #define SyncStandbysDefined() \
 	(SyncRepStandbyNames != NULL && SyncRepStandbyNames[0] != '\0')
 
@@ -126,6 +128,20 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		return;
 	}
 
+
+	/*
+	 * Fast exit also if no synchronous standby servers are presently connected
+	 * and if the primary server has been configured to continue on without them.
+	 */
+	if ( SyncRepStandaloneMasterAllowed )
+	{
+		if ( ! SyncRepCheckIfStandaloneMaster() )
+		{
+			LWLockRelease(SyncRepLock);
+			return;
+		}
+	}
+
 	/*
 	 * Set our waitLSN so WALSender will know when to wake us, and add
 	 * ourselves to the queue.
@@ -326,6 +344,63 @@ SyncRepCleanupAtProcExit(void)
 }
 
 /*
+ * Check if the master should switch to standalone mode and stop trying
+ *  to wait for standby synchronization because there are no standby servers currently
+ * connected. If there are servers connected, then switch back and start waiting for them.
+ * Must hold SyncRepLock.
+ */
+bool SyncRepCheckIfStandaloneMaster()
+{
+	bool standby_connected = false;
+	int i = 0;
+
+	if (!SyncRepRequested() || !SyncStandbysDefined())
+		return false;
+
+	if ( ! WalSndCtl->sync_standalone_allowed )
+	        return false;
+
+	for (i = 0; i < max_wal_senders && ! standby_connected; i++)
+	{
+		volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+		if ( walsnd->pid != 0 && walsnd->sync_standby_priority )
+		{
+			standby_connected = true;
+
+			if ( WalSndCtl->sync_standalone_master )
+			{
+				ereport(LOG,
+					(errmsg("waiting for standby synchronization"),
+					 errhidestmt(true)));
+
+				WalSndCtl->sync_standalone_master = false;
+			}
+		}
+	}
+
+	if ( ! standby_connected )
+	{
+		if ( ! WalSndCtl->sync_standalone_master )
+		{
+			ereport(LOG,
+				(errmsg("not waiting for standby synchronization"),
+				 errhidestmt(true)));
+
+			WalSndCtl->sync_standalone_master = true;
+
+			/*
+			 * We just switched mode and do not want to wait for standby sync anymore.
+			 * Wake others who may be waiting at this point
+			 */
+			SyncRepWakeQueue(true);
+		}
+	}
+
+	return standby_connected;
+
+}
+
+/*
  * ===========================================================
  * Synchronous Replication functions for wal sender processes
  * ===========================================================
@@ -603,6 +678,25 @@ SyncRepUpdateSyncStandbysDefined(void)
 	}
 }
 
+
+void
+SyncRepUpdateSyncStandaloneAllowed(void)
+{
+	bool value = SyncRepStandaloneMasterAllowed;
+
+	if ( SyncRepStandaloneMasterAllowed != WalSndCtl->sync_standalone_allowed )
+	{
+		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+
+		if ( SyncRepStandaloneMasterAllowed )
+			SyncRepWakeQueue(true);
+
+		WalSndCtl->sync_standalone_allowed = SyncRepStandaloneMasterAllowed;
+
+		LWLockRelease(SyncRepLock);
+	}
+}
+
 #ifdef USE_ASSERT_CHECKING
 static bool
 SyncRepQueueIsOrderedByLSN(void)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index ea86520..ddfaa09 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -932,11 +932,23 @@ InitWalSnd(void)
 		}
 	}
 	if (MyWalSnd == NULL)
+	{
 		ereport(FATAL,
 				(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
 				 errmsg("number of requested standby connections "
 						"exceeds max_wal_senders (currently %d)",
 						max_wal_senders)));
+	}
+	else
+	{
+		/*
+		 * A standby just connected, check if the master should
+		 * switch from standalone to synchronous mode.
+		 */
+		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+		SyncRepCheckIfStandaloneMaster();
+		LWLockRelease(SyncRepLock);
+	}
 
 	/* Arrange to clean up at walsender exit */
 	on_shmem_exit(WalSndKill, 0);
@@ -955,6 +967,13 @@ WalSndKill(int code, Datum arg)
 	MyWalSnd->pid = 0;
 	DisownLatch(&MyWalSnd->latch);
 
+	/*
+	 * Check if this was the last standby
+	 */
+	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+	SyncRepCheckIfStandaloneMaster();
+	LWLockRelease(SyncRepLock);
+
 	/* WalSnd struct isn't mine anymore */
 	MyWalSnd = NULL;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index da7b6d4..f26ee7a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1375,6 +1375,16 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
+		{"synchronous_standalone_master", PGC_SIGHUP, REPLICATION_MASTER,
+			gettext_noop("Specifies whether we allow the master to process transactions alone when there is no connected standby."),
+			NULL
+		},
+		&SyncRepStandaloneMasterAllowed,
+		false,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"hot_standby", PGC_POSTMASTER, REPLICATION_STANDBY,
 			gettext_noop("Allows connections and queries during recovery."),
 			NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 315db46..812bdf0 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -215,6 +215,11 @@
 				# from standby(s); '*' = all
 #vacuum_defer_cleanup_age = 0	# number of xacts by which cleanup is delayed
 
+#synchronous_standalone_master = off	# Whether the master can continue processing
+					# commits when no sync standbys are connected
+					# or if it has to wait until one connects.
+
+
 # - Standby Servers -
 
 # These settings are ignored on a master server
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 65b725f..0699e73 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -23,6 +23,8 @@
 /* user-settable parameters for synchronous replication */
 extern char *SyncRepStandbyNames;
 
+extern bool SyncRepStandaloneMasterAllowed;
+
 /* called by user backend */
 extern void SyncRepWaitForLSN(XLogRecPtr XactCommitLSN);
 
@@ -35,6 +37,9 @@ extern void SyncRepReleaseWaiters(void);
 
 /* called by wal writer */
 extern void SyncRepUpdateSyncStandbysDefined(void);
+extern void SyncRepUpdateSyncStandaloneAllowed(void);
+
+extern bool SyncRepCheckIfStandaloneMaster(void);
 
 /* called by various procs */
 extern int	SyncRepWakeQueue(bool all);
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index be7a341..d1dc606 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -85,6 +85,18 @@ typedef struct
 	 */
 	bool		sync_standbys_defined;
 
+	/*
+	 * Whether the synchronous master is allowed to switch to standalone mode
+	 * when there are not standby servers connected.
+	 */
+	bool            sync_standalone_allowed;
+
+	/*
+	 * Whether the synchronous master is currently running in standalone mode
+	 * because there are no WAL senders connected.
+	 */
+	bool            sync_standalone_master;
+
 	WalSnd		walsnds[1];		/* VARIABLE LENGTH ARRAY */
 } WalSndCtlData;
 
#2Fujii Masao
masao.fujii@gmail.com
In reply to: Alexander Björnhagen (#1)
Re: Standalone synchronous master

On Mon, Dec 26, 2011 at 5:08 AM, Alexander Björnhagen
<alex.bjornhagen@gmail.com> wrote:

I’m new here so maybe someone else already has this in the works ?

No, as far as I know.

And so on ... any comments are welcome :)

Basically I like this whole idea, but I'd like to know why do you
think this functionality is required?

When is the replication mode switched from "standalone" to "sync"?
That happens as soon as
sync standby appears? or it has caught up with the master? The former
might block the
transactions for a long time until the standby has caught up with the
master even though
synchronous_standalone_master is enabled and a user wants to avoid
such a downtime.

When standalone master is enabled, you might lose some committed
transactions at failover
as follows:

1. While synchronous replication is running normally, replication
connection is closed because of
network outage.
2. The master works standalone because of
synchronous_standalone_master=on and some
new transactions are committed though their WAL records are not
replicated to the standby.
3. The master crashes for some reasons, the clusterware detects it and
triggers a failover.
4. The standby which doesn't have recent committed transactions
becomes the master at a failover...

Is this scenario acceptable?

To avoid such a loss of transactions, I'm thinking to introduce new
GUC parameter specifying
the shell command which is executed when replication mode is switched
from "sync" to "standalone".
If we set it to something like STONITH command, we can shut down
forcibly the standby before
the master resumes the transactions, and avoid the failover to the
obsolete standby when the
master crashes.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#3Alexander Björnhagen
alex.bjornhagen@gmail.com
In reply to: Fujii Masao (#2)
1 attachment(s)
Re: Standalone synchronous master

Hello and thank you for your feedback I appreciate it.

Updated patch : sync-standalone-v2.patch

I am sorry to be spamming the list but I just cleaned it up a little
bit, wrote better comments and tried to move most of the logic into
syncrep.c since that's where it belongs anyway and also fixed a small
bug where standalone mode was disabled/enabled runtime via SIGHUP.

Basically I like this whole idea, but I'd like to know why do you think this functionality is required?

How should a synchronous master handle the situation where all
standbys have failed ?

Well, I think this is one of those cases where you could argue either
way. Someone caring more about high availability of the system will
want to let the master continue and just raise an alert to the
operators. Someone looking for an absolute guarantee of data
replication will say otherwise.

I don’t like introducing config variables just for the fun of it, but
I think in this case there is no right and wrong.

Oracle dataguard replication has three different configurable modes
called “performance/availability/protection” which for postgres
corresponds exactly with “async/sync+standalone/sync”.

When is the replication mode switched from "standalone" to "sync"?

Good question. Currently that happens when a standby server has
connected and also been deemed suitable for synchronous commit by the
master ( meaning that its name matches the config variable
synchronous_standby_names ). So in a setup with both synchronous and
asynchronous standbys, the master only considers the synchronous ones
when deciding on standalone mode. The asynchronous standbys are
“useless” to a synchronous master anyway.

The former might block the transactions for a long time until the standby has caught up with the master even though synchronous_standalone_master is enabled and a user wants to avoid such a downtime.

If we a talking about a network “glitch”, than the standby would take
a few seconds/minutes to catch up (not hours!) which is acceptable if
you ask me.

If we are talking about say a node failure, I suppose the workaround
even on current code is to bring up the new standby first as
asynchronous and then simply switch it to synchronous by editing
synchronous_standby_names on the master. Voila ! :)

So in effect this is a non-issue since there is a possible workaround, agree ?

1. While synchronous replication is running normally, replication
connection is closed because of
network outage.
2. The master works standalone because of
synchronous_standalone_master=on and some
new transactions are committed though their WAL records are not
replicated to the standby.
3. The master crashes for some reasons, the clusterware detects it and
triggers a failover.
4. The standby which doesn't have recent committed transactions

becomes the master at a failover...

Is this scenario acceptable?

So you have two separate failures in less time than an admin would
have time to react and manually bring up a new standby.

I’d argue that your system in not designed to be redundant enough if
that kind of scenario worries you. But the point where it all goes
wrong is where the ”clusterware” decides to fail over automatically.
In that kind of setup synchronous_standalone_master must likely be off
but again if the “clusterware” is not smart enough to take the right
decision then it should not act at all. Better to just log critical
alerts, send sms to people etc.

Makes sense ? :)

Cheers,

/A

Attachments:

sync-standalone-v2.patchapplication/octet-stream; name=sync-standalone-v2.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0cc3296..e89d8aa 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2182,6 +2182,24 @@ SET ENABLE_SEQSCAN TO OFF;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-synchronous-standalone-master" xreflabel="synchronous-standalone-master">
+      <term><varname>synchronous_standalone_master</varname> (<type>boolean</type>)</term>
+      <indexterm>
+       <primary><varname>synchronous_standalone_master</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+       Specifies how the master behaves when <xref linkend="guc-synchronous-commit">
+	is set to <literal>on</> and <xref linkend="guc-synchronous-standby-names"> is configured but no
+	 appropriate standby servers are currently connected. If enabled, the master will
+	 continue processing transactions alone. If disabled, all the transactions on the
+	 master are blocked until a synchronous standby has appeared.
+
+	The default is disabled.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
 
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e9ae1e8..b2ee202 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -353,6 +353,7 @@ CheckpointerMain(void)
 
 	/* Do this once before starting the loop, then just at SIGHUP time. */
 	SyncRepUpdateSyncStandbysDefined();
+	SyncRepUpdateSyncStandaloneAllowed();
 
 	/*
 	 * Loop forever
@@ -382,6 +383,7 @@ CheckpointerMain(void)
 			ProcessConfigFile(PGC_SIGHUP);
 			/* update global shmem state for sync rep */
 			SyncRepUpdateSyncStandbysDefined();
+			SyncRepUpdateSyncStandaloneAllowed();
 		}
 		if (checkpoint_requested)
 		{
@@ -658,6 +660,7 @@ CheckpointWriteDelay(int flags, double progress)
 			ProcessConfigFile(PGC_SIGHUP);
 			/* update global shmem state for sync rep */
 			SyncRepUpdateSyncStandbysDefined();
+			SyncRepUpdateSyncStandaloneAllowed();
 		}
 
 		AbsorbFsyncRequests();
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 95de6c7..393068c 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -59,6 +59,8 @@
 /* User-settable parameters for sync rep */
 char	   *SyncRepStandbyNames;
 
+bool       SyncRepStandaloneMasterAllowed;
+
 #define SyncStandbysDefined() \
 	(SyncRepStandbyNames != NULL && SyncRepStandbyNames[0] != '\0')
 
@@ -126,6 +128,17 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		return;
 	}
 
+
+	/*
+	 * Fast exit also if running in standalone mode
+	 * because there are no synchronous standbys connected
+	 */
+	if ( WalSndCtl->sync_standalone_master )
+	{
+		LWLockRelease(SyncRepLock);
+		return;
+	}
+
 	/*
 	 * Set our waitLSN so WALSender will know when to wake us, and add
 	 * ourselves to the queue.
@@ -326,6 +339,63 @@ SyncRepCleanupAtProcExit(void)
 }
 
 /*
+ * Check if the master should switch to standalone mode and stop trying
+ *  to wait for standby synchronization because there are no standby servers currently
+ * connected. If there are servers connected, then switch back and start waiting for them.
+ * This function is called on connect/disconnect of standby WAL senders. Must hold SyncRepLock.
+ */
+bool SyncRepCheckIfStandaloneMaster()
+{
+	bool standby_connected = false;
+	int i = 0;
+
+	if (!SyncRepRequested() || !SyncStandbysDefined())
+		return false;
+
+	if ( ! WalSndCtl->sync_standalone_allowed )
+	{
+		WalSndCtl->sync_standalone_master = false;
+	        return false;
+	}
+
+	for (i = 0; i < max_wal_senders && ! standby_connected; i++)
+	{
+		volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+		if ( walsnd->pid != 0 && walsnd->sync_standby_priority )
+		{
+			standby_connected = true;
+			if ( WalSndCtl->sync_standalone_master )
+			{
+				ereport(LOG,
+					(errmsg("waiting for standby synchronization"),
+					 errhidestmt(true)));
+
+				WalSndCtl->sync_standalone_master = false;
+			}
+		}
+	}
+
+	if ( ! standby_connected )
+	{
+		if ( ! WalSndCtl->sync_standalone_master )
+		{
+			ereport(LOG,
+				(errmsg("not waiting for standby synchronization"),
+				 errhidestmt(true)));
+
+			WalSndCtl->sync_standalone_master = true;
+
+			/*
+			 * We just switched to standalone mode so wake up anyone that is waiting
+			 */
+			SyncRepWakeQueue(true);
+		}
+	}
+
+	return standby_connected;
+}
+
+/*
  * ===========================================================
  * Synchronous Replication functions for wal sender processes
  * ===========================================================
@@ -349,6 +419,7 @@ SyncRepInitConfig(void)
 	{
 		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 		MyWalSnd->sync_standby_priority = priority;
+		SyncRepCheckIfStandaloneMaster();
 		LWLockRelease(SyncRepLock);
 		ereport(DEBUG1,
 			(errmsg("standby \"%s\" now has synchronous standby priority %u",
@@ -581,7 +652,6 @@ SyncRepUpdateSyncStandbysDefined(void)
 	if (sync_standbys_defined != WalSndCtl->sync_standbys_defined)
 	{
 		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
-
 		/*
 		 * If synchronous_standby_names has been reset to empty, it's futile
 		 * for backends to continue to waiting.  Since the user no longer
@@ -598,6 +668,25 @@ SyncRepUpdateSyncStandbysDefined(void)
 		 * the queue (and never wake up).  This prevents that.
 		 */
 		WalSndCtl->sync_standbys_defined = sync_standbys_defined;
+		LWLockRelease(SyncRepLock);
+	}
+}
+
+/*
+ * The background writer calls this as needed to update the shared
+ * sync_standalone_allowed flag. If the flag is enabled, then also check if
+ * any synchronous standby servers are connected in order to switch mode from sync
+ * replication to standalone mode.
+ */
+void
+SyncRepUpdateSyncStandaloneAllowed(void)
+{
+	if ( SyncRepStandaloneMasterAllowed != WalSndCtl->sync_standalone_allowed )
+	{
+		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+
+		WalSndCtl->sync_standalone_allowed = SyncRepStandaloneMasterAllowed;
+		SyncRepCheckIfStandaloneMaster();
 
 		LWLockRelease(SyncRepLock);
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index ea86520..1da44e0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -955,6 +955,13 @@ WalSndKill(int code, Datum arg)
 	MyWalSnd->pid = 0;
 	DisownLatch(&MyWalSnd->latch);
 
+	/*
+	 * Check if this was the last standby
+	 */
+	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+	SyncRepCheckIfStandaloneMaster();
+	LWLockRelease(SyncRepLock);
+
 	/* WalSnd struct isn't mine anymore */
 	MyWalSnd = NULL;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index da7b6d4..f26ee7a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1375,6 +1375,16 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
+		{"synchronous_standalone_master", PGC_SIGHUP, REPLICATION_MASTER,
+			gettext_noop("Specifies whether we allow the master to process transactions alone when there is no connected standby."),
+			NULL
+		},
+		&SyncRepStandaloneMasterAllowed,
+		false,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"hot_standby", PGC_POSTMASTER, REPLICATION_STANDBY,
 			gettext_noop("Allows connections and queries during recovery."),
 			NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 315db46..812bdf0 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -215,6 +215,11 @@
 				# from standby(s); '*' = all
 #vacuum_defer_cleanup_age = 0	# number of xacts by which cleanup is delayed
 
+#synchronous_standalone_master = off	# Whether the master can continue processing
+					# commits when no sync standbys are connected
+					# or if it has to wait until one connects.
+
+
 # - Standby Servers -
 
 # These settings are ignored on a master server
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 65b725f..0699e73 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -23,6 +23,8 @@
 /* user-settable parameters for synchronous replication */
 extern char *SyncRepStandbyNames;
 
+extern bool SyncRepStandaloneMasterAllowed;
+
 /* called by user backend */
 extern void SyncRepWaitForLSN(XLogRecPtr XactCommitLSN);
 
@@ -35,6 +37,9 @@ extern void SyncRepReleaseWaiters(void);
 
 /* called by wal writer */
 extern void SyncRepUpdateSyncStandbysDefined(void);
+extern void SyncRepUpdateSyncStandaloneAllowed(void);
+
+extern bool SyncRepCheckIfStandaloneMaster(void);
 
 /* called by various procs */
 extern int	SyncRepWakeQueue(bool all);
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index be7a341..954c79f 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -85,6 +85,17 @@ typedef struct
 	 */
 	bool		sync_standbys_defined;
 
+	/*
+	 * Is the synchronous master allowed to switch to standalone mode when no
+	 * synchronous standby servers are connected ? Protected by SyncRepLock.
+	 */
+	bool            sync_standalone_allowed;
+
+	/*
+	 * Is the synchronous master currently running in standalone mode ? Protected by SyncRepLock.
+	 */
+	bool            sync_standalone_master;
+
 	WalSnd		walsnds[1];		/* VARIABLE LENGTH ARRAY */
 } WalSndCtlData;
 
#4Magnus Hagander
magnus@hagander.net
In reply to: Alexander Björnhagen (#3)
Re: Standalone synchronous master

On Mon, Dec 26, 2011 at 13:51, Alexander Björnhagen
<alex.bjornhagen@gmail.com> wrote:

Hello and thank you for your feedback I appreciate it.

Updated patch : sync-standalone-v2.patch

I am sorry to be spamming the list but I just cleaned it up a little
bit, wrote better comments and tried to move most of the logic into
syncrep.c since that's where it belongs anyway and also fixed a small
bug where standalone mode was disabled/enabled runtime via SIGHUP.

It's not spam when it's an updated patch ;)

Basically I like this whole idea, but I'd like to know why do you think this functionality is required?

How should a synchronous master handle the situation where all
standbys have failed ?

Well, I think this is one of those cases where you could argue either
way. Someone caring more about high availability of the system will
want to let the master continue and just raise an alert to the
operators. Someone looking for an absolute guarantee of data
replication will say otherwise.

If you don't care about the absolute guarantee of data, why not just
use async replication? It's still going to replicate the data over to
the client as quickly as it can - which in the end is the same level
of guarantee that you get with this switch set, isn't it?

When is the replication mode switched from "standalone" to "sync"?

Good question. Currently that happens when a standby server has
connected and also been deemed suitable for synchronous commit by the
master ( meaning that its name matches the config variable
synchronous_standby_names ). So in a setup with both synchronous and
asynchronous standbys, the master only considers the synchronous ones
when deciding on standalone mode. The asynchronous standbys are
“useless” to a synchronous master anyway.

But wouldn't an async standby still be a lot better than no standby at
all (standalone)?

The former might block the transactions for a long time until the standby has caught up with the master even though synchronous_standalone_master is enabled and a user wants to avoid such a downtime.

If we a talking about a network “glitch”, than the standby would take
a few seconds/minutes to catch up (not hours!) which is acceptable if
you ask me.

So it's not Ok to block the master when the standby goes away, but it
is ok to block it when it comes back and catches up? The goes away
might be the same amount of time - or even shorter, depending on
exactly how the network works..

1. While synchronous replication is running normally, replication
connection is closed because of
   network outage.
2. The master works standalone because of
synchronous_standalone_master=on and some
   new transactions are committed though their WAL records are not
replicated to the standby.
3. The master crashes for some reasons, the clusterware detects it and
triggers a failover.
4. The standby which doesn't have recent committed transactions

becomes the master at a failover...

Is this scenario acceptable?

So you have two separate failures in less time than an admin would
have time to react and manually bring up a new standby.

Given that one is a network failure, and one is a node failure, I
don't see that being strange at all. For example, a HA network
environment might cause a short glitch when it's failing over to a
redundant node - enough to bring down the replication connection and
require it to reconnect (during which the master would be ahead of the
slave).

In fact, both might well be network failures - one just making the
master completely inaccessble, and thus triggering the need for a
failover.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#5Alexander Björnhagen
alex.bjornhagen@gmail.com
In reply to: Magnus Hagander (#4)
Re: Standalone synchronous master

Interesting discussion!

Basically I like this whole idea, but I'd like to know why do you think this functionality is required?

How should a synchronous master handle the situation where all
standbys have failed ?

Well, I think this is one of those cases where you could argue either
way. Someone caring more about high availability of the system will
want to let the master continue and just raise an alert to the
operators. Someone looking for an absolute guarantee of data
replication will say otherwise.

If you don't care about the absolute guarantee of data, why not just
use async replication? It's still going to replicate the data over to
the client as quickly as it can - which in the end is the same level
of guarantee that you get with this switch set, isn't it?

This setup does still guarantee that if the master fails, then you can
still fail over to the standby without any possible data loss because
all data is synchronously replicated.

I want to replicate data with synchronous guarantee to a disaster site
*when possible*. If there is any chance that commits can be
replicated, then I’d like to wait for that.

If however the disaster node/site/link just plain fails and
replication goes down for an *indefinite* amount of time, then I want
the primary node to continue operating, raise an alert and deal with
that. Rather than have the whole system grind to a halt just because a
standby node failed.

It’s not so much that I don’t “care” about replication guarantee, then
I’d just use asynchronous and be done with it. My point is that it is
not always black and white and for some system setups you have to
balance a few things against each other.

If we were just talking about network glitches then I would be fine
with the current behavior because I do not believe they are
long-lasting anyway and they are also *quantifiable* which is a huge
bonus.

My primary focus is system availability but I also care about all that
other stuff too.

I want to have the cake and eat it at the same time as we say in Sweden ;)

When is the replication mode switched from "standalone" to "sync"?

Good question. Currently that happens when a standby server has
connected and also been deemed suitable for synchronous commit by the
master ( meaning that its name matches the config variable
synchronous_standby_names ). So in a setup with both synchronous and
asynchronous standbys, the master only considers the synchronous ones
when deciding on standalone mode. The asynchronous standbys are
“useless” to a synchronous master anyway.

But wouldn't an async standby still be a lot better than no standby at
all (standalone)?

As soon as the standby comes back online, I want to wait for it to sync.

The former might block the transactions for a long time until the standby has caught up with the master even though synchronous_standalone_master is enabled and a user wants to avoid such a downtime.

If we a talking about a network “glitch”, than the standby would take
a few seconds/minutes to catch up (not hours!) which is acceptable if
you ask me.

So it's not Ok to block the master when the standby goes away, but it
is ok to block it when it comes back and catches up? The goes away
might be the same amount of time - or even shorter, depending on
exactly how the network works..

To be honest I don’t have a very strong opinion here, we could go
either way, I just wanted to keep this patch as small as possible to
begin with. But again network glitches aren’t my primary concern in a
HA system because the amount of data that the standby lags behind is
possible to estimate and plan for.

Typically switch convergence takes in the order of 15-30 seconds and I
can thus typically assume that the restarted standby can recover that
gap in less than a minute. So once upon a blue moon when something
like that happens, commits would take up to say 1 minute longer. No
big deal IMHO.

1. While synchronous replication is running normally, replication
connection is closed because of
network outage.
2. The master works standalone because of
synchronous_standalone_master=on and some
new transactions are committed though their WAL records are not
replicated to the standby.
3. The master crashes for some reasons, the clusterware detects it and
triggers a failover.
4. The standby which doesn't have recent committed transactions
becomes the master at a failover...

Is this scenario acceptable?

So you have two separate failures in less time than an admin would
have time to react and manually bring up a new standby.

Given that one is a network failure, and one is a node failure, I
don't see that being strange at all. For example, a HA network
environment might cause a short glitch when it's failing over to a
redundant node - enough to bring down the replication connection and
require it to reconnect (during which the master would be ahead of the
slave).

In fact, both might well be network failures - one just making the
master completely inaccessble, and thus triggering the need for a
failover.

You still have two failures on a two-node system.

If we are talking about a setup with only two nodes (which I am), then
I think it’s fair to limit the discussion to one failure (whatever
that might be! node,switch,disk,site,intra-site link, power, etc ...).

And in that case, there are only really three likely scenarios :
1) The master fails
2) The standby fails
3) Both fail (due to shared network gear, power, etc)

Yes there might be a need to failover and Yes the standby could
possibly have lagged behind the master but with my sync+standalone
mode, you reduce the risk of that compared to just async mode.

So decrease the risk of data loss (case 1), increase system
availability/uptime (case 2).

That is a actually a pretty good description of my goal here :)

Cheers,

/A

#6Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Magnus Hagander (#4)
Re: Standalone synchronous master

Magnus Hagander <magnus@hagander.net> writes:

If you don't care about the absolute guarantee of data, why not just
use async replication? It's still going to replicate the data over to
the client as quickly as it can - which in the end is the same level
of guarantee that you get with this switch set, isn't it?

Isn't that equivalent to setting synchronous_standby_names to '' and
reloading the server?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#7Magnus Hagander
magnus@hagander.net
In reply to: Alexander Björnhagen (#5)
Re: Standalone synchronous master

On Mon, Dec 26, 2011 at 15:59, Alexander Björnhagen
<alex.bjornhagen@gmail.com> wrote:

Basically I like this whole idea, but I'd like to know why do you think this functionality is required?

How should a synchronous master handle the situation where all
standbys have failed ?

Well, I think this is one of those cases where you could argue either
way. Someone caring more about high availability of the system will
want to let the master continue and just raise an alert to the
operators. Someone looking for an absolute guarantee of data
replication will say otherwise.

If you don't care about the absolute guarantee of data, why not just
use async replication? It's still going to replicate the data over to
the client as quickly as it can - which in the end is the same level
of guarantee that you get with this switch set, isn't it?

This setup does still guarantee that if the master fails, then you can
still fail over to the standby without any possible data loss because
all data is synchronously replicated.

Only if you didn't have a network hitch, or if your slave was down.

Which basically means it doesn't *guarantee* it.

I want to replicate data with synchronous guarantee to a disaster site
*when possible*. If there is any chance that commits can be
replicated, then I’d like to wait for that.

There's always a chance, it's just about how long you're willing to wait ;)

Another thought could be to have something like a "sync_wait_timeout",
saying "i'm willing to wait <n> seconds for the syncrep to be caught
up. If nobody is cauth up within that time,then I can back down to
async mode/"standalone" mode". That way, data availaibility wouldn't
be affected by short-time network glitches.

If however the disaster node/site/link just plain fails and
replication goes down for an *indefinite* amount of time, then I want
the primary node to continue operating, raise an alert and deal with
that. Rather than have the whole system grind to a halt just because a
standby node failed.

If the standby node failed and can be determined to actually be failed
(by say a cluster manager), you can always have your cluster software
(or DBA, of course) turn it off by editing the config setting and
reloading. Doing it that way you can actually *verify* that the site
is gone for an indefinite amount of time.

It’s not so much that I don’t “care” about replication guarantee, then
I’d just use asynchronous and be done with it. My point is that it is
not always black and white and for some system setups you have to
balance a few things against each other.

Agreed in principle :-)

If we were just talking about network glitches then I would be fine
with the current behavior because I do not believe they are
long-lasting anyway and they are also *quantifiable* which is a huge
bonus.

But the proposed switches doesn't actually make it possible to
differentiate between these "non-long-lasting" issues and long-lasting
ones, does it? We might want an interface that actually does...

My primary focus is system availability but I also care about all that
other stuff too.

I want to have the cake and eat it at the same time as we say in Sweden ;)

Of course - we all do :D

When is the replication mode switched from "standalone" to "sync"?

Good question. Currently that happens when a standby server has
connected and also been deemed suitable for synchronous commit by the
master ( meaning that its name matches the config variable
synchronous_standby_names ). So in a setup with both synchronous and
asynchronous standbys, the master only considers the synchronous ones
when deciding on standalone mode. The asynchronous standbys are
“useless” to a synchronous master anyway.

But wouldn't an async standby still be a lot better than no standby at
all (standalone)?

As soon as the standby comes back online, I want to wait for it to sync.

I guess I just find this very inconsistent. You're willing to wait,
but only sometimes. You're not willing to wait when it goes down, but
you are willing to wait when it comes back. I don't see why this
should be different, and I don't see how you can reliably
differentiate between these two.

The former might block the transactions for a long time until the standby has caught up with the master even though synchronous_standalone_master is enabled and a user wants to avoid such a downtime.

If we a talking about a network “glitch”, than the standby would take
a few seconds/minutes to catch up (not hours!) which is acceptable if
you ask me.

So it's not Ok to block the master when the standby goes away, but it
is ok to block it when it comes back and catches up? The goes away
might be the same amount of time - or even shorter, depending on
exactly how the network works..

To be honest I don’t have a very strong opinion here, we could go
either way, I just wanted to keep this patch as small as possible to
begin with. But again network glitches aren’t my primary concern in a
HA system because the amount of data that the standby lags behind is
possible to estimate and plan for.

Typically switch convergence takes in the order of 15-30 seconds and I
can thus typically assume that the restarted standby can recover that
gap in less than a minute. So once upon a blue moon when something
like that happens, commits would take up to say 1 minute longer. No
big deal IMHO.

What about the slave rebooting, for example? That'll usually be pretty
quick too - so you'd be ok waiting for that. But your patch doesn't
let you wait for that - it will switch to standalone mode right away?
But if it takes 30 seconds to reboot, and then 30 seconds to catch up,
you are *not* willing to wait for the first 30 seconds, but you 'are*
willing fo wait for the second? Just seems strange to me, I guess...

1. While synchronous replication is running normally, replication
connection is closed because of
   network outage.
2. The master works standalone because of
synchronous_standalone_master=on and some
   new transactions are committed though their WAL records are not
replicated to the standby.
3. The master crashes for some reasons, the clusterware detects it and
triggers a failover.
4. The standby which doesn't have recent committed transactions
becomes the master at a failover...

Is this scenario acceptable?

So you have two separate failures in less time than an admin would
have time to react and manually bring up a new standby.

Given that one is a network failure, and one is a node failure, I
don't see that being strange at all. For example, a HA network
environment might cause a short glitch when it's failing over to a
redundant node - enough to bring down the replication connection and
require it to reconnect (during which the master would be ahead of the
slave).

In fact, both might well be network failures - one just making the
master completely inaccessble, and thus triggering the need for a
failover.

You still have two failures on a two-node system.

Yes - but only one (or zero) of them is actually to any of the nodes :-)

If we are talking about a setup with only two nodes (which I am), then
I think it’s fair to limit the discussion to one failure (whatever
that might be! node,switch,disk,site,intra-site link, power, etc ...).

And in that case, there are only really three likely scenarios :
1)      The master fails
2)      The standby fails
3)      Both fail (due to shared network gear, power, etc)

Yes there might be a need to failover and Yes the standby could
possibly have lagged behind the master but with my sync+standalone
mode, you reduce the risk of that compared to just async mode.

So decrease the risk of data loss (case 1), increase system
availability/uptime (case 2).

That is a actually a pretty good description of my goal here :)

Cheers,

/A

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#8Guillaume Lelarge
guillaume@lelarge.info
In reply to: Magnus Hagander (#7)
Re: Standalone synchronous master

On Mon, 2011-12-26 at 16:23 +0100, Magnus Hagander wrote:

On Mon, Dec 26, 2011 at 15:59, Alexander Björnhagen
<alex.bjornhagen@gmail.com> wrote:

Basically I like this whole idea, but I'd like to know why do you think this functionality is required?

How should a synchronous master handle the situation where all
standbys have failed ?

Well, I think this is one of those cases where you could argue either
way. Someone caring more about high availability of the system will
want to let the master continue and just raise an alert to the
operators. Someone looking for an absolute guarantee of data
replication will say otherwise.

If you don't care about the absolute guarantee of data, why not just
use async replication? It's still going to replicate the data over to
the client as quickly as it can - which in the end is the same level
of guarantee that you get with this switch set, isn't it?

This setup does still guarantee that if the master fails, then you can
still fail over to the standby without any possible data loss because
all data is synchronously replicated.

Only if you didn't have a network hitch, or if your slave was down.

Which basically means it doesn't *guarantee* it.

It doesn't guarantee it, but it increases the master availability.
That's the kind of customization some users would like to have. Though I
find it weird to introduce another GUC there. Why not add a new enum
value to synchronous_commit, such as local_only_if_slaves_unavailable
(yeah, the enum value is completely stupid, but you get my point).

--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
PostgreSQL Sessions #3: http://www.postgresql-sessions.org

#9Alexander Björnhagen
alex.bjornhagen@gmail.com
In reply to: Magnus Hagander (#7)
Re: Standalone synchronous master

Hmm,

I suppose this conversation would lend itself better to a whiteboard
or a maybe over a few beers instead of via e-mail ...

Basically I like this whole idea, but I'd like to know why do you think this functionality is required?

How should a synchronous master handle the situation where all
standbys have failed ?

Well, I think this is one of those cases where you could argue either
way. Someone caring more about high availability of the system will
want to let the master continue and just raise an alert to the
operators. Someone looking for an absolute guarantee of data
replication will say otherwise.

If you don't care about the absolute guarantee of data, why not just
use async replication? It's still going to replicate the data over to
the client as quickly as it can - which in the end is the same level
of guarantee that you get with this switch set, isn't it?

This setup does still guarantee that if the master fails, then you can
still fail over to the standby without any possible data loss because
all data is synchronously replicated.

Only if you didn't have a network hitch, or if your slave was down.

Which basically means it doesn't *guarantee* it.

True. In my two-node system, I’m willing to take that risk when my
only standby has failed.

Most likely (compared to any other scenario), we can re-gain
redundancy before another failure occurs.

Say each one of your nodes can fail once a year. Most people have much
better track record than with their production machines/network/etc
but just as an example. Then on any given day there is a 0,27% chance
that at given node will fail (1/365*100=0,27), right ?

Then the probability of both failing on the same day is (0,27%)^2 =
0,000074 % or about 1 in 13500. And given that it would take only a
few hours tops to restore redundancy, it is even less of a chance than
that because you would not be exposed for the entire day.

So, to be a bit blunt about it and I hope I don’t come off a rude
here, this dual-failure or creeping-doom type scenario on a two-node
system is probably not relevant but more an academical question.

I want to replicate data with synchronous guarantee to a disaster site
*when possible*. If there is any chance that commits can be
replicated, then I’d like to wait for that.

There's always a chance, it's just about how long you're willing to wait ;)

Yes, exactly. When I can estimate it I’m willing to wait.

Another thought could be to have something like a "sync_wait_timeout",
saying "i'm willing to wait <n> seconds for the syncrep to be caught
up. If nobody is cauth up within that time,then I can back down to
async mode/"standalone" mode". That way, data availaibility wouldn't
be affected by short-time network glitches.

This was also mentioned in the previous thread I linked to,
“replication_timeout“ :

http://archives.postgresql.org/pgsql-hackers/2010-10/msg01009.php

In a HA environment you have redundant networking and bonded
interfaces on each node. The only “glitch” would really be if a switch
failed over and that’s a pretty big “if” right there.

If however the disaster node/site/link just plain fails and
replication goes down for an *indefinite* amount of time, then I want
the primary node to continue operating, raise an alert and deal with
that. Rather than have the whole system grind to a halt just because a
standby node failed.

If the standby node failed and can be determined to actually be failed
(by say a cluster manager), you can always have your cluster software
(or DBA, of course) turn it off by editing the config setting and
reloading. Doing it that way you can actually *verify* that the site
is gone for an indefinite amount of time.

The system might as well do this for me when the standby gets
disconnected instead of halting the master.

If we were just talking about network glitches then I would be fine
with the current behavior because I do not believe they are
long-lasting anyway and they are also *quantifiable* which is a huge
bonus.

But the proposed switches doesn't actually make it possible to
differentiate between these "non-long-lasting" issues and long-lasting
ones, does it? We might want an interface that actually does...

“replication_timeout” where the primary disconnects the WAL sender
after a timeout together with “synchronous_standalone_master” which
tells the primary it can continue anyway when that happens allows
exactly that. This would then be first part towards that but I wanted
to start out small and I personally think it is sufficient to draw the
line at TCP disconnect of the standby.

When is the replication mode switched from "standalone" to "sync"?

Good question. Currently that happens when a standby server has
connected and also been deemed suitable for synchronous commit by the
master ( meaning that its name matches the config variable
synchronous_standby_names ). So in a setup with both synchronous and
asynchronous standbys, the master only considers the synchronous ones
when deciding on standalone mode. The asynchronous standbys are
“useless” to a synchronous master anyway.

But wouldn't an async standby still be a lot better than no standby at
all (standalone)?

As soon as the standby comes back online, I want to wait for it to sync.

I guess I just find this very inconsistent. You're willing to wait,
but only sometimes. You're not willing to wait when it goes down, but
you are willing to wait when it comes back. I don't see why this
should be different, and I don't see how you can reliably
differentiate between these two.

When the wait is quantifiable, I want to wait (like a connected
standby that is in the process of catching up). When it is not (like
when the remote node disappeared and the master has no way of knowing
for how long), I do not want to wait.

In both cases I want to send off alerts, get people involved and fix
the problem causing this, it is not something that should happen
often.

The former might block the transactions for a long time until the standby has caught up with the master even though synchronous_standalone_master is enabled and a user wants to avoid such a downtime.

If we a talking about a network “glitch”, than the standby would take
a few seconds/minutes to catch up (not hours!) which is acceptable if
you ask me.

So it's not Ok to block the master when the standby goes away, but it
is ok to block it when it comes back and catches up? The goes away
might be the same amount of time - or even shorter, depending on
exactly how the network works..

To be honest I don’t have a very strong opinion here, we could go
either way, I just wanted to keep this patch as small as possible to
begin with. But again network glitches aren’t my primary concern in a
HA system because the amount of data that the standby lags behind is
possible to estimate and plan for.

Typically switch convergence takes in the order of 15-30 seconds and I
can thus typically assume that the restarted standby can recover that
gap in less than a minute. So once upon a blue moon when something
like that happens, commits would take up to say 1 minute longer. No
big deal IMHO.

What about the slave rebooting, for example? That'll usually be pretty
quick too - so you'd be ok waiting for that. But your patch doesn't
let you wait for that - it will switch to standalone mode right away?
But if it takes 30 seconds to reboot, and then 30 seconds to catch up,
you are *not* willing to wait for the first 30 seconds, but you 'are*
willing fo wait for the second? Just seems strange to me, I guess...

That’s exactly right. While the standby is booting, the master has no
way of knowing what is going on with that standby so then I don’t want
to wait.

When the standby has managed to boot, connect and started to sync up
the data that it was lagging behind, then I do want to wait because I
know that it will not take too long before it has caught up.

1. While synchronous replication is running normally, replication
connection is closed because of
network outage.
2. The master works standalone because of
synchronous_standalone_master=on and some
new transactions are committed though their WAL records are not
replicated to the standby.
3. The master crashes for some reasons, the clusterware detects it and
triggers a failover.
4. The standby which doesn't have recent committed transactions
becomes the master at a failover...

Is this scenario acceptable?

So you have two separate failures in less time than an admin would
have time to react and manually bring up a new standby.

Given that one is a network failure, and one is a node failure, I
don't see that being strange at all. For example, a HA network
environment might cause a short glitch when it's failing over to a
redundant node - enough to bring down the replication connection and
require it to reconnect (during which the master would be ahead of the
slave).

In fact, both might well be network failures - one just making the
master completely inaccessble, and thus triggering the need for a
failover.

You still have two failures on a two-node system.

Yes - but only one (or zero) of them is actually to any of the nodes :-)

It doesn’t matter from the viewpoint of our primary and standby
servers. If the link to the standby fails so that it is unreachable
from the master, then the master may consider that node as failed. It
does not matter that the component which failed was not part of that
physical machine, it still rendered it useless because it is no longer
reachable.

So in the previous example where one network link fails and then one
node fails, I see that as two separate failures. If it is possible to
take out both primary and standby servers with only one component
failing (shared network/power/etc), then the system is not designed
right because there is a single-point of failure and no software in
the world will ever save you from that.

That’s why I tried to limit ourselves to the simple use-case where
either the standby or the primary node fails. If both fail then all
bets are off, you’re going to have a very bad day at the office
anyway.

Show quoted text

If we are talking about a setup with only two nodes (which I am), then
I think it’s fair to limit the discussion to one failure (whatever
that might be! node,switch,disk,site,intra-site link, power, etc ...).

And in that case, there are only really three likely scenarios :
1) The master fails
2) The standby fails
3) Both fail (due to shared network gear, power, etc)

Yes there might be a need to failover and Yes the standby could
possibly have lagged behind the master but with my sync+standalone
mode, you reduce the risk of that compared to just async mode.

So decrease the risk of data loss (case 1), increase system
availability/uptime (case 2).

That is a actually a pretty good description of my goal here :)

Cheers,

/A

#10Alexander Björnhagen
alex.bjornhagen@gmail.com
In reply to: Guillaume Lelarge (#8)
Re: Standalone synchronous master

On Mon, Dec 26, 2011 at 5:18 PM, Guillaume Lelarge
<guillaume@lelarge.info> wrote:

On Mon, 2011-12-26 at 16:23 +0100, Magnus Hagander wrote:

On Mon, Dec 26, 2011 at 15:59, Alexander Björnhagen
<alex.bjornhagen@gmail.com> wrote:

Basically I like this whole idea, but I'd like to know why do you think this functionality is required?

How should a synchronous master handle the situation where all
standbys have failed ?

Well, I think this is one of those cases where you could argue either
way. Someone caring more about high availability of the system will
want to let the master continue and just raise an alert to the
operators. Someone looking for an absolute guarantee of data
replication will say otherwise.

If you don't care about the absolute guarantee of data, why not just
use async replication? It's still going to replicate the data over to
the client as quickly as it can - which in the end is the same level
of guarantee that you get with this switch set, isn't it?

This setup does still guarantee that if the master fails, then you can
still fail over to the standby without any possible data loss because
all data is synchronously replicated.

Only if you didn't have a network hitch, or if your slave was down.

Which basically means it doesn't *guarantee* it.

It doesn't guarantee it, but it increases the master availability.

Yes exactly.

That's the kind of customization some users would like to have. Though I
find it weird to introduce another GUC there. Why not add a new enum
value to synchronous_commit, such as local_only_if_slaves_unavailable
(yeah, the enum value is completely stupid, but you get my point).

You are right an enum makes much more sense, and the patch would be
much smaller as well so I’ll rework that bit.

/A

#11Magnus Hagander
magnus@hagander.net
In reply to: Alexander Björnhagen (#9)
Re: Standalone synchronous master

On Mon, Dec 26, 2011 at 18:01, Alexander Björnhagen
<alex.bjornhagen@gmail.com> wrote:

Hmm,

I suppose this conversation would lend itself better to a whiteboard
or a maybe over a few beers instead of via e-mail  ...

mmm. beer... :-)

Well, I think this is one of those cases where you could argue either
way. Someone caring more about high availability of the system will
want to let the master continue and just raise an alert to the
operators. Someone looking for an absolute guarantee of data
replication will say otherwise.

If you don't care about the absolute guarantee of data, why not just
use async replication? It's still going to replicate the data over to
the client as quickly as it can - which in the end is the same level
of guarantee that you get with this switch set, isn't it?

This setup does still guarantee that if the master fails, then you can
still fail over to the standby without any possible data loss because
all data is synchronously replicated.

Only if you didn't have a network hitch, or if your slave was down.

Which basically means it doesn't *guarantee* it.

True. In my two-node system, I’m willing to take that risk when my
only standby has failed.

Most likely (compared to any other scenario), we can re-gain
redundancy before another failure occurs.

Say each one of your nodes can fail once a year. Most people have much
better track record than with their production machines/network/etc
but just as an example. Then on any given day there is a 0,27% chance
that at given node will fail (1/365*100=0,27), right ?

Then the probability of both failing on the same day is (0,27%)^2 =
0,000074 % or about 1 in 13500. And given that it would take only a
few hours tops to restore redundancy, it is even less of a chance than
that because you would not be exposed for the entire day.

That is assuming the failures are actually independent. In my
experience, they're usually not.

But that's diverging into math, which really isn't my strong side here :D

So, to be a bit blunt about it and I hope I don’t come off a rude
here, this dual-failure or creeping-doom type scenario on a two-node
system is probably not relevant but more an academical question.

Given how many times I've seen it, I'm going to respectfully disagree
with that ;)

That said, I agree it's not necessarily reasonable to try to defend
against that in a two node cluster. You can always make it three-node
if you need to do that. I'm worried that the interface seems a bit
fragile and that it's hard to "be sure". Predictable interfaces are
good.. :-)

I want to replicate data with synchronous guarantee to a disaster site
*when possible*. If there is any chance that commits can be
replicated, then I’d like to wait for that.

There's always a chance, it's just about how long you're willing to wait ;)

Yes, exactly. When I can estimate it I’m willing to wait.

Another thought could be to have something like a "sync_wait_timeout",
saying "i'm willing to wait <n> seconds for the syncrep to be caught
up. If nobody is cauth up within that time,then I can back down to
async mode/"standalone" mode". That way, data availaibility wouldn't
be affected by short-time network glitches.

This was also mentioned in the previous thread I linked to,
“replication_timeout“ :

http://archives.postgresql.org/pgsql-hackers/2010-10/msg01009.php

Hmm. That link was gone from the thread when I read it - I missed it
completely. Sorry about that.

So reading that thread, it really only takes care of one of the cases
- the replication_timeout only fires if the slave "goes dead". It
could be useful if a similar timeout would apply if I did a "pg_ctl
restart" on the slave - making the master wait <n> seconds before
going into standalone mode. The way I read the proposal now, the
master would immediately go into standalone mode if the standby
actually *closes* the connection instead of timing it out?

In a HA environment you have redundant networking and bonded
interfaces on each node. The only “glitch” would really be if a switch
failed over and that’s a pretty big “if” right there.

Switches fail a lot. And there are a lot more things in between that
can fail. I don't think it's such a big if - network issues are by far
the most common cases of a HA environment failing I've seen lately.

If however the disaster node/site/link just plain fails and
replication goes down for an *indefinite* amount of time, then I want
the primary node to continue operating, raise an alert and deal with
that. Rather than have the whole system grind to a halt just because a
standby node failed.

If the standby node failed and can be determined to actually be failed
(by say a cluster manager), you can always have your cluster software
(or DBA, of course) turn it off by editing the config setting and
reloading. Doing it that way you can actually *verify* that the site
is gone for an indefinite amount of time.

The system might as well do this for me when the standby gets
disconnected instead of halting the master.

I guess two ways of seeing it - the flip of that coin is "the system
can already do this for you"...

If we were just talking about network glitches then I would be fine
with the current behavior because I do not believe they are
long-lasting anyway and they are also *quantifiable* which is a huge
bonus.

But the proposed switches doesn't actually make it possible to
differentiate between these "non-long-lasting" issues and long-lasting
ones, does it? We might want an interface that actually does...

“replication_timeout” where the primary disconnects the WAL sender
after a timeout together with “synchronous_standalone_master” which
tells the primary it can continue anyway when that happens allows
exactly that. This would then be first part towards that but I wanted
to start out small and I personally think it is sufficient to draw the
line at TCP disconnect of the standby.

Maybe it is. It still seems fragile to me.

But wouldn't an async standby still be a lot better than no standby at
all (standalone)?

As soon as the standby comes back online, I want to wait for it to sync.

I guess I just find this very inconsistent. You're willing to wait,
but only sometimes. You're not willing to wait when it goes down, but
you are willing to wait when it comes back. I don't see why this
should be different, and I don't see how you can reliably
differentiate between these two.

When the wait is quantifiable, I want to wait (like a connected
standby that is in the process of catching up). When it is not (like
when the remote node disappeared and the master has no way of knowing
for how long), I do not want to wait.
In both cases I want to send off alerts, get people involved and fix
the problem causing this, it is not something that should happen
often.

Of course.

What about the slave rebooting, for example? That'll usually be pretty
quick too - so you'd be ok waiting for that. But your patch doesn't
let you wait for that - it will switch to standalone mode right away?
But if it takes 30 seconds to reboot, and then 30 seconds to catch up,
you are *not* willing to wait for the first 30 seconds, but you 'are*
willing fo wait for the second? Just seems strange to me, I guess...

That’s exactly right. While the standby is booting, the master has no
way of knowing what is going on with that standby so then I don’t want
to wait.

When the standby has managed to boot, connect and started to sync up
the data that it was lagging behind, then I do want to wait because I
know that it will not take too long before it has caught up.

Yeah, that does make sense, when you look at it like that.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#12Alexander Björnhagen
alex.bjornhagen@gmail.com
In reply to: Magnus Hagander (#11)
1 attachment(s)
Re: Standalone synchronous master

Okay,

Here’s version 3 then, which piggy-backs on the existing flag :

synchronous_commit = on | off | local | fallback

Where “fallback” now means “fall back from sync replication when no
(suitable) standbys are connected”.

This was done on input from Guillaume Lelarge.

That said, I agree it's not necessarily reasonable to try to defend
against that in a two node cluster.

That’s what I’ve been trying to say all along but I didn’t give enough
context before so I understand we took a turn there.

You can always walk up to any setup and say “hey, if you nuke that
site from orbit and crash that other thing, and ...” ;) I’m just
kidding of course but you get the point. Nothing is absolute.

And so we get back to the three likelihoods in our two-node setup :

1.The master fails
- Okay, promote the standby

2.The standby fails
- Okay, the system still works but you no longer have data
redundancy. Deal with it.

3.Both fail, together or one after the other.

I’ve stated that 1 and 2 together covers way more than 99.9% of what’s
expected in my setup on any given day.

But 3. is what we’ve been talking about ... And well in that case
there is no reason to just go ahead and promote a standby because,
granted, it could be lagging behind if the master decided to switch to
standalone mode just before going down itself.

As long as you do not prematurely or rather instinctively promote the
standby when it has *possibly* lagged behind, you’re good and there is
no risk of data loss. The data might be sitting on a crashed or
otherwise unavailable master, but it’s not lost. Promoting the standby
however is basically saying “forget the master and its data, continue
from where the standby is currently at”.

Now granted this is operationally harder/more complicated than just
synchronous replication where you can always, in any case, just
promote the standby after a master failure, knowing that all data is
guaranteed to be replicated.

I'm worried that the interface seems a bit
fragile and that it's hard to "be sure".

With this setup, you can’t promote the standby without first checking
if the replication link was disconnected prior to the master failure.

For me, the benefits outweigh this one drawback because I get more
standby replication guarantee than async replication and more master
availability than sync replication in the most plausible outcomes.

Cheers,

/A

Attachments:

sync-standalone-v3.patchapplication/octet-stream; name=sync-standalone-v3.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0cc3296..60bebee 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1560,7 +1560,8 @@ SET ENABLE_SEQSCAN TO OFF;
         Specifies whether transaction commit will wait for WAL records
         to be written to disk before the command returns a <quote>success</>
         indication to the client.  Valid values are <literal>on</>,
-        <literal>local</>, and <literal>off</>.  The default, and safe, value
+        <literal>local</>, <literal>fallback</> and <literal>off</>.
+         The default, and safe, value
         is <literal>on</>.  When <literal>off</>, there can be a delay between
         when success is reported to the client and when the transaction is
         really guaranteed to be safe against a server crash.  (The maximum
@@ -1574,6 +1575,10 @@ SET ENABLE_SEQSCAN TO OFF;
         can be a useful alternative when performance is more important than
         exact certainty about the durability of a transaction.  For more
         discussion see <xref linkend="wal-async-commit">.
+        If set to <literal>fallback</>, the master will act as if it was set to
+         <literal>on</> except in the special case where all suitable synchronous
+         standbys are currentlydisconnected, in which case it will temporarily fall
+	  back to <literal>local</>  mode until suitable standbys are connected.
        </para>
        <para>
         If <xref linkend="guc-synchronous-standby-names"> is set, this
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e9ae1e8..7c39bad 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -352,7 +352,7 @@ CheckpointerMain(void)
 		ThisTimeLineID = GetRecoveryTargetTLI();
 
 	/* Do this once before starting the loop, then just at SIGHUP time. */
-	SyncRepUpdateSyncStandbysDefined();
+	SyncRepUpdateConfig();
 
 	/*
 	 * Loop forever
@@ -381,7 +381,7 @@ CheckpointerMain(void)
 			got_SIGHUP = false;
 			ProcessConfigFile(PGC_SIGHUP);
 			/* update global shmem state for sync rep */
-			SyncRepUpdateSyncStandbysDefined();
+			SyncRepUpdateConfig();
 		}
 		if (checkpoint_requested)
 		{
@@ -657,7 +657,7 @@ CheckpointWriteDelay(int flags, double progress)
 			got_SIGHUP = false;
 			ProcessConfigFile(PGC_SIGHUP);
 			/* update global shmem state for sync rep */
-			SyncRepUpdateSyncStandbysDefined();
+			SyncRepUpdateConfig();
 		}
 
 		AbsorbFsyncRequests();
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 95de6c7..559c89c 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -126,6 +126,17 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		return;
 	}
 
+
+	/*
+	 * Fast exit also if running in standalone mode
+	 * because there are no synchronous standbys connected
+	 */
+	if ( WalSndCtl->sync_standalone_master )
+	{
+		LWLockRelease(SyncRepLock);
+		return;
+	}
+
 	/*
 	 * Set our waitLSN so WALSender will know when to wake us, and add
 	 * ourselves to the queue.
@@ -326,6 +337,58 @@ SyncRepCleanupAtProcExit(void)
 }
 
 /*
+ * Check if the master should switch to standalone mode and stop trying
+ *  to wait for standby synchronization because there are no standby servers currently
+ * connected. If there are servers connected, then switch back and start waiting for them.
+ * This function is called on connect/disconnect of standby WAL senders. Must hold SyncRepLock.
+ */
+void SyncRepCheckIfStandaloneMaster()
+{
+	bool sync_standby_connected = false;
+	int i = 0;
+
+	if (!SyncRepRequested() || !SyncStandbysDefined() || ! WalSndCtl->sync_standalone_allowed)
+	{
+		WalSndCtl->sync_standalone_master = false;
+		return;
+	}
+
+	for (i = 0; i < max_wal_senders && ! sync_standby_connected; i++)
+	{
+		volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+		if ( walsnd->pid != 0 && walsnd->sync_standby_priority )
+		{
+			sync_standby_connected = true;
+			if ( WalSndCtl->sync_standalone_master )
+			{
+				ereport(LOG,
+					(errmsg("waiting for standby synchronization"),
+					 errhidestmt(true)));
+
+				WalSndCtl->sync_standalone_master = false;
+			}
+		}
+	}
+
+	if ( ! sync_standby_connected )
+	{
+		if ( ! WalSndCtl->sync_standalone_master )
+		{
+			ereport(LOG,
+				(errmsg("not waiting for standby synchronization"),
+				 errhidestmt(true)));
+
+			WalSndCtl->sync_standalone_master = true;
+
+			/*
+			 * We just switched to standalone mode so wake up anyone that is waiting
+			 */
+			SyncRepWakeQueue(true);
+		}
+	}
+}
+
+/*
  * ===========================================================
  * Synchronous Replication functions for wal sender processes
  * ===========================================================
@@ -345,10 +408,11 @@ SyncRepInitConfig(void)
 	 * for handling replies from standby.
 	 */
 	priority = SyncRepGetStandbyPriority();
-	if (MyWalSnd->sync_standby_priority != priority)
+	if (priority)
 	{
 		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 		MyWalSnd->sync_standby_priority = priority;
+		SyncRepCheckIfStandaloneMaster();
 		LWLockRelease(SyncRepLock);
 		ereport(DEBUG1,
 			(errmsg("standby \"%s\" now has synchronous standby priority %u",
@@ -567,6 +631,17 @@ SyncRepWakeQueue(bool all)
 }
 
 /*
+ * The background writer calls this as needed to update the shared WalSndCtl
+ * with new sync-related config.
+ */
+void
+SyncRepUpdateConfig(void)
+{
+	SyncRepUpdateSyncStandbysDefined();
+	SyncRepUpdateSyncStandaloneAllowed();
+}
+
+/*
  * The background writer calls this as needed to update the shared
  * sync_standbys_defined flag, so that backends don't remain permanently wedged
  * if synchronous_standby_names is unset.  It's safe to check the current value
@@ -598,6 +673,28 @@ SyncRepUpdateSyncStandbysDefined(void)
 		 * the queue (and never wake up).  This prevents that.
 		 */
 		WalSndCtl->sync_standbys_defined = sync_standbys_defined;
+		LWLockRelease(SyncRepLock);
+	}
+
+
+}
+
+/*
+ * The background writer calls this as needed to update the shared
+ * sync_standalone_allowed flag. If the flag is enabled, then also check if
+ * any synchronous standby servers are connected in order to switch mode from sync
+ * replication to standalone mode.
+ */
+void
+SyncRepUpdateSyncStandaloneAllowed(void)
+{
+	bool		SyncRepStandaloneMasterAllowed = SyncRepRequested() && synchronous_commit == SYNCHRONOUS_COMMIT_FALLBACK;
+	if ( SyncRepStandaloneMasterAllowed != WalSndCtl->sync_standalone_allowed )
+	{
+		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+
+		WalSndCtl->sync_standalone_allowed = SyncRepStandaloneMasterAllowed;
+		SyncRepCheckIfStandaloneMaster();
 
 		LWLockRelease(SyncRepLock);
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index ea86520..1da44e0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -955,6 +955,13 @@ WalSndKill(int code, Datum arg)
 	MyWalSnd->pid = 0;
 	DisownLatch(&MyWalSnd->latch);
 
+	/*
+	 * Check if this was the last standby
+	 */
+	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+	SyncRepCheckIfStandaloneMaster();
+	LWLockRelease(SyncRepLock);
+
 	/* WalSnd struct isn't mine anymore */
 	MyWalSnd = NULL;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index da7b6d4..f2d8f96 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -370,11 +370,12 @@ static const struct config_enum_entry constraint_exclusion_options[] = {
 };
 
 /*
- * Although only "on", "off", and "local" are documented, we
+ * Although only "on", "off", "fallback" and "local" are documented, we
  * accept all the likely variants of "on" and "off".
  */
 static const struct config_enum_entry synchronous_commit_options[] = {
 	{"local", SYNCHRONOUS_COMMIT_LOCAL_FLUSH, false},
+	{"fallback", SYNCHRONOUS_COMMIT_FALLBACK, false},
 	{"on", SYNCHRONOUS_COMMIT_ON, false},
 	{"off", SYNCHRONOUS_COMMIT_OFF, false},
 	{"true", SYNCHRONOUS_COMMIT_ON, true},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 315db46..83bc120 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -158,7 +158,7 @@
 #wal_level = minimal			# minimal, archive, or hot_standby
 					# (change requires restart)
 #fsync = on				# turns forced synchronization on or off
-#synchronous_commit = on		# synchronization level; on, off, or local
+#synchronous_commit = on		# synchronization level; on, off, fallback or local
 #wal_sync_method = fsync		# the default is the first option
 					# supported by the operating system:
 					#   open_datasync
@@ -215,6 +215,7 @@
 				# from standby(s); '*' = all
 #vacuum_defer_cleanup_age = 0	# number of xacts by which cleanup is delayed
 
+
 # - Standby Servers -
 
 # These settings are ignored on a master server
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index aaa6204..6e8f7dc 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -55,6 +55,7 @@ typedef enum
 {
 	SYNCHRONOUS_COMMIT_OFF,		/* asynchronous commit */
 	SYNCHRONOUS_COMMIT_LOCAL_FLUSH,		/* wait for local flush only */
+	SYNCHRONOUS_COMMIT_FALLBACK,	/* wait for local and remote flush, if connected standbys */
 	SYNCHRONOUS_COMMIT_REMOTE_FLUSH		/* wait for local and remote flush */
 }	SyncCommitLevel;
 
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 65b725f..66a2c71 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -34,7 +34,9 @@ extern void SyncRepInitConfig(void);
 extern void SyncRepReleaseWaiters(void);
 
 /* called by wal writer */
-extern void SyncRepUpdateSyncStandbysDefined(void);
+extern void SyncRepUpdateSyncConfig(void);
+
+extern void SyncRepCheckIfStandaloneMaster(void);
 
 /* called by various procs */
 extern int	SyncRepWakeQueue(bool all);
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index be7a341..954c79f 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -85,6 +85,17 @@ typedef struct
 	 */
 	bool		sync_standbys_defined;
 
+	/*
+	 * Is the synchronous master allowed to switch to standalone mode when no
+	 * synchronous standby servers are connected ? Protected by SyncRepLock.
+	 */
+	bool            sync_standalone_allowed;
+
+	/*
+	 * Is the synchronous master currently running in standalone mode ? Protected by SyncRepLock.
+	 */
+	bool            sync_standalone_master;
+
 	WalSnd		walsnds[1];		/* VARIABLE LENGTH ARRAY */
 } WalSndCtlData;
 
#13Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Björnhagen (#12)
Re: Standalone synchronous master

On Tue, Dec 27, 2011 at 6:39 AM, Alexander Björnhagen
<alex.bjornhagen@gmail.com> wrote:

And so we get back to the three likelihoods in our two-node setup :

1.The master fails
 - Okay, promote the standby

2.The standby fails
 - Okay, the system still works but you no longer have data
redundancy. Deal with it.

3.Both fail, together or one after the other.

It seems to me that if you are happy with #2, you don't really need to
enable sync rep in the first place.

At any rate, even without multiple component failures, this
configuration makes it pretty easy to lose durability (which is the
only point of having sync rep in the first place). Suppose the NIC
card on the master is the failing component. If it happens to drop
the TCP connection to the clients just before it drops the connection
to the standby, the standby will have all the transactions, and you
can fail over just fine. If it happens to drop the TCP connection to
the just before it drops the connection to the clients, the standby
will not have all the transactions, and failover will lose some
transactions - and presumably you enabled this feature in the first
place precisely to prevent that sort of occurrence.

I do think that it might be useful to have this if there's a
configurable timeout involved - that way, people could say, well, I'm
OK with maybe losing transactions if the standby has been gone for X
seconds. But if the only possible behavior is equivalent to a
zero-second timeout I don't think it's too useful. It's basically
just going to lead people to believe that their data is more secure
than it really is, which IMHO is not helpful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Aidan Van Dyk
aidan@highrise.ca
In reply to: Robert Haas (#13)
Re: Standalone synchronous master

On Tue, Jan 3, 2012 at 9:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:

It seems to me that if you are happy with #2, you don't really need to
enable sync rep in the first place.

At any rate, even without multiple component failures, this
configuration makes it pretty easy to lose durability (which is the
only point of having sync rep in the first place).  Suppose the NIC
card on the master is the failing component.  If it happens to drop
the TCP connection to the clients just before it drops the connection
to the standby, the standby will have all the transactions, and you
can fail over just fine.  If it happens to drop the TCP connection to
the just before it drops the connection to the clients, the standby
will not have all the transactions, and failover will lose some
transactions - and presumably you enabled this feature in the first
place precisely to prevent that sort of occurrence.

I do think that it might be useful to have this if there's a
configurable timeout involved - that way, people could say, well, I'm
OK with maybe losing transactions if the standby has been gone for X
seconds.  But if the only possible behavior is equivalent to a
zero-second timeout I don't think it's too useful.  It's basically
just going to lead people to believe that their data is more secure
than it really is, which IMHO is not helpful.

So, I'm a big fan of syncrep guaranteeing it's guarantees. To me,
that's the whole point. Having it "fall out of sync rep" at any point
*automatically* seems to be exactly counter to the point of sync rep.

That said, I'm also a big fan of monitoring everything as well as I could...

I'ld love a "hook" script that was run if sync-rep state ever changed
(heck, I'ld even like it if it just choose a new sync standby).

Even better, is there a way we could start injecting "notify" events
into the cluster on these types of changes? Especially now that
notify events can take payloads, it means I don't have to keep
constantly polling the database to see if it things its connected,
etc.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

#15Robert Haas
robertmhaas@gmail.com
In reply to: Aidan Van Dyk (#14)
Re: Standalone synchronous master

On Wed, Jan 4, 2012 at 9:28 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:

I'ld love a "hook" script that was run if sync-rep state ever changed
(heck, I'ld even like it if it just choose a new sync standby).

That seems useful. I don't think the current code quite knows its own
state; we seem to have each walsender recompute who the boss is, and
if you query pg_stat_replication that redoes the same calculation. I
can't shake the feeling that there's a better way... which would also
facilitate this.

Even better, is there a way we could start injecting "notify" events
into the cluster on these types of changes?  Especially now that
notify events can take payloads, it means I don't have to keep
constantly polling the database to see if it things its connected,
etc.

I like this idea, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Aidan Van Dyk (#14)
Re: Standalone synchronous master

On Wed, Jan 4, 2012 at 11:28 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:

So, I'm a big fan of syncrep guaranteeing it's guarantees.  To me,
that's the whole point.  Having it "fall out of sync rep" at any point
*automatically* seems to be exactly counter to the point of sync rep.

Yes, what Alexander proposed is not sync rep. It's new replication mode.
If we adopt the proposal, we have three replication modes, async, sync,
what Alexander proposed, like Oracle DataGuard provides. If you need
the guarantee which sync rep provides, you can choose sync as replication
mode.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#17Alexander Björnhagen
alex.bjornhagen@gmail.com
In reply to: Fujii Masao (#16)
Re: Standalone synchronous master

At this point I feel that this new functionality might be a bit
overkill for postgres, maybe it's better to stay lean and mean rather
than add a controversial feature like this.

I also agree that a more general replication timeout variable would be
more useful to a larger audience but that would in my view add more
complexity to the replication code which is quite simple and
understandable right now ...

Anyway, my backup plan was to achieve the same thing by triggering on
the logging produced on the primary server and switch to async mode
when detecting that the standby replication link has failed (and then
back again when it is restored). In effect I would put a replication
monitor on the outside of the server instead of embedding it.

/A

#18Jeff Janes
jeff.janes@gmail.com
In reply to: Alexander Björnhagen (#17)
Re: Standalone synchronous master

On Fri, Jan 13, 2012 at 2:30 AM, Alexander Björnhagen
<alex.bjornhagen@gmail.com> wrote:

At this point I feel that this new functionality might be a bit
overkill for postgres, maybe it's better to stay lean and mean rather
than add a controversial feature like this.

I don't understand why this is controversial. In the current code, if
you have a master and a single sync standby, and the master disappears
and you promote the standby, now the new master is running *without a
standby*. If you are willing to let the new master run without a
standby, why are you not willing to let the
the old one do so if it were the standby which failed in the first place?

Cheers,

Jeff

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#18)
Re: Standalone synchronous master

Jeff Janes <jeff.janes@gmail.com> writes:

I don't understand why this is controversial. In the current code, if
you have a master and a single sync standby, and the master disappears
and you promote the standby, now the new master is running *without a
standby*.

If you configured it to use sync rep, it won't accept any transactions
until you give it a standby. If you configured it not to, then it's you
that has changed the replication requirements.

If you are willing to let the new master run without a
standby, why are you not willing to let the
the old one do so if it were the standby which failed in the first place?

Doesn't follow.

regards, tom lane

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Janes (#18)
Re: Standalone synchronous master

Jeff Janes <jeff.janes@gmail.com> wrote:\

I don't understand why this is controversial.

I'm having a hard time seeing why this is considered a feature. It
seems to me what is being proposed is a mode with no higher
integrity guarantee than asynchronous replication, but latency
equivalent to synchronous replication. I can see where it's
tempting to want to think it gives something more in terms of
integrity guarantees, but when I think it through, I'm not really
seeing any actual benefit.

If this fed into something such that people got jabber message,
emails, or telephone calls any time it switched between synchronous
and stand-alone mode, that would make it a built-in monitoring,
fail-over, and alert system -- which *would* have some value. But
in the past we've always recommended external tools for such
features.

-Kevin

#21Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Kevin Grittner (#20)
Re: Standalone synchronous master

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

I'm having a hard time seeing why this is considered a feature. It
seems to me what is being proposed is a mode with no higher
integrity guarantee than asynchronous replication, but latency
equivalent to synchronous replication. I can see where it's
tempting to want to think it gives something more in terms of
integrity guarantees, but when I think it through, I'm not really
seeing any actual benefit.

Same here, so what I think is that the new recv and write modes that
Fujii is working on could maybe be demoted from sync variant, while not
being really async ones. Maybe “eager” or some other term.

It seems to me that would answer the OP use case and your remark here.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#22Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#19)
Re: Standalone synchronous master

On Fri, Jan 13, 2012 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

I don't understand why this is controversial.  In the current code, if
you have a master and a single sync standby, and the master disappears
and you promote the standby, now the new master is running *without a
standby*.

If you configured it to use sync rep, it won't accept any transactions
until you give it a standby.  If you configured it not to, then it's you
that has changed the replication requirements.

Sure, but isn't that a very common usage? Maybe my perceptions are
out of whack, but I commonly hear about fail-over and rarely hear
about using more than one slave so that you can fail over and still
have a positive number of slaves.

Cheers,

Jeff

#23Jeff Janes
jeff.janes@gmail.com
In reply to: Kevin Grittner (#20)
Re: Standalone synchronous master

On Fri, Jan 13, 2012 at 10:12 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Jeff Janes <jeff.janes@gmail.com> wrote:\

I don't understand why this is controversial.

I'm having a hard time seeing why this is considered a feature.  It
seems to me what is being proposed is a mode with no higher
integrity guarantee than asynchronous replication, but latency
equivalent to synchronous replication.

There are never 100% guarantees. You could always have two
independent failures (the WAL disk of the master and of the slave)
nearly simultaneously.

If you look at weaker guarantees, then with asynchronous replication
you are almost guaranteed to lose transactions on a fail-over of a
busy server, and with the proposed option you are almost guaranteed
not to, as long as disconnections are rare.

As far as latency, I think there are many cases when a small latency
is pretty much equivalent to zero latency. A human on the other end
of a commit is unlikely to notice a latency of 0.1 seconds.

 I can see where it's
tempting to want to think it gives something more in terms of
integrity guarantees, but when I think it through, I'm not really
seeing any actual benefit.

I think the value of having a synchronously replicated commit is
greater than zero but less than infinite. I don't think it is
outrageous to think that that value could be approximately expressed
in seconds you are willing to wait for that replicated commit before
going ahead without it.

If this fed into something such that people got jabber message,
emails, or telephone calls any time it switched between synchronous
and stand-alone mode, that would make it a built-in monitoring,
fail-over, and alert system -- which *would* have some value.  But
in the past we've always recommended external tools for such
features.

Since synchronous_standby_names cannot be changed without bouncing the
server, we do not provide the tools for an external tool to make this
change cleanly.

Cheers,

Jeff

#24Fujii Masao
masao.fujii@gmail.com
In reply to: Jeff Janes (#23)
Re: Standalone synchronous master

On Mon, Jan 16, 2012 at 7:01 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Fri, Jan 13, 2012 at 10:12 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Jeff Janes <jeff.janes@gmail.com> wrote:\

I don't understand why this is controversial.

I'm having a hard time seeing why this is considered a feature.  It
seems to me what is being proposed is a mode with no higher
integrity guarantee than asynchronous replication, but latency
equivalent to synchronous replication.

There are never 100% guarantees.  You could always have two
independent failures (the WAL disk of the master and of the slave)
nearly simultaneously.

If you look at weaker guarantees, then with asynchronous replication
you are almost guaranteed to lose transactions on a fail-over of a
busy server, and with the proposed option you are almost guaranteed
not to, as long as disconnections are rare.

Yes. The proposed mode guarantees that you don't lose transactions
when single failure happens, but asynchronous replication doesn't. So
the proposed one has the benefit of reducing the risk of data loss to
a certain extent.

OTOH, when more than one failures happen, in the proposed mode, you
may lose transactions. For example, imagine the case where the standby
crashes, the standalone master runs for a while, then its database gets
corrupted. In this case, you would lose any transactions committed while
standalone master is running.

So, if you want to avoid such a data loss, you can use synchronous replication
mode. OTOH, if you can endure the data loss caused by double failure for
some reasons (e.g., using reliable hardware...) but not that caused by single
failure, and want to improve the availability (i.e., want to prevent
transactions
from being blocked after single failure happens), the proposed one is good
option to use. I believe that some people need this proposed replication mode.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#25Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#23)
Re: Standalone synchronous master

On Sun, Jan 15, 2012 at 5:01 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Since synchronous_standby_names cannot be changed without bouncing the
server, we do not provide the tools for an external tool to make this
change cleanly.

Yes, it can. It's PGC_SIGHUP.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#26Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#13)
Re: Standalone synchronous master

On Tue, Jan 3, 2012 at 09:22:22PM -0500, Robert Haas wrote:

On Tue, Dec 27, 2011 at 6:39 AM, Alexander Bj�rnhagen
<alex.bjornhagen@gmail.com> wrote:

And so we get back to the three likelihoods in our two-node setup :

1.The master fails
�- Okay, promote the standby

2.The standby fails
�- Okay, the system still works but you no longer have data
redundancy. Deal with it.

3.Both fail, together or one after the other.

It seems to me that if you are happy with #2, you don't really need to
enable sync rep in the first place.

At any rate, even without multiple component failures, this
configuration makes it pretty easy to lose durability (which is the
only point of having sync rep in the first place). Suppose the NIC
card on the master is the failing component. If it happens to drop
the TCP connection to the clients just before it drops the connection
to the standby, the standby will have all the transactions, and you
can fail over just fine. If it happens to drop the TCP connection to
the just before it drops the connection to the clients, the standby
will not have all the transactions, and failover will lose some
transactions - and presumably you enabled this feature in the first
place precisely to prevent that sort of occurrence.

I do think that it might be useful to have this if there's a
configurable timeout involved - that way, people could say, well, I'm
OK with maybe losing transactions if the standby has been gone for X
seconds. But if the only possible behavior is equivalent to a
zero-second timeout I don't think it's too useful. It's basically
just going to lead people to believe that their data is more secure
than it really is, which IMHO is not helpful.

Added to TODO:

Add a new "eager" synchronous mode that starts out synchronous but
reverts to asynchronous after a failure timeout period

This would require some type of command to be executed to alert
administrators of this change.

http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +