Reducing power consumption on idle servers

Started by Simon Riggsalmost 4 years ago69 messages
#1Simon Riggs
simon.riggs@enterprisedb.com
1 attachment(s)

Some years ago we did a pass through the various worker processes to
add hibernation as a mechanism to reduce power consumption on an idle
server. Replication never got the memo, so power consumption on an
idle server is not very effective on standby or logical subscribers.
The code and timing for hibernation is also different for each worker,
which is confusing.

Proposal is to improve the situation and reduce power consumption on
an idle server, with a series of fairly light changes to give positive
green/environmental benefit.

CURRENT STATE

These servers naturally sleep for long periods when inactive:
* postmaster - 60s
* checkpointer - checkpoint_timeout
* syslogger - log_rotation_age
* pgarch - 60s
* autovac launcher - autovacuum_naptime
* walsender - wal_sender_timeout /2
* contrib/prewarm - checkpoint_timeout or shutdown
* pgstat - except on windows, see later

The following servers all have some kind of hibernation modes:
* bgwriter - 50 * bgwriter_delay
* walwriter. - 25 * wal_writer_delay

These servers don't try to hibernate at all:
* logical worker - 1s
* logical launcher - wal_retrieve_retry_interval (undocumented)
* startup - hardcoded 5s when streaming, wal_receiver_retry_interval
for WAL files
* wal_receiver - 100ms, currently gets woken when WAL arrives
* pgstat - 2s (on Windows only)

PROPOSED CHANGES

1. Standardize the hibernation time at 60s, using a #define
HIBERNATE_DELAY_SEC 60

2. Refactor postmaster and pgarch to use that setting, rather than hardcoded 60

3. Standardize the hibernation design pattern through a set of macros
in latch.h,
based on the coding of walwriter.c, since that was the best
example of hibernation code.
Hibernation is independent for each process.
This is explained in the header for latch.h, with an example in
src/test/modules/worker_spi/worker_spi.c
The intention here is to provide simple facilities to allow authors
of bg workers to add hibernation code also.
Summary: after 50 idle loops, hibernate for 60s each time through the loop
In all cases, switch immediately back into action when needed.

4. Change these processes to hibernate using the standard design pattern
* bgwriter - currently gets woken when user allocates a bugger, no
change proposed
* walwriter - currently gets woken by XLogFlush, no change proposed

5. Startup process has a hardcoded 5s loop because it checks for
trigger file to promote it. So hibernating would mean that it would
promote more slowly, and/or restart failing walreceiver more slowly,
so this requires user approval, and hence add new GUCs to approve that
choice. This is a valid choice because a long-term idle server is
obviously not in current use, so waiting 60s for failover or restart
is very unlikely to cause significant issue. Startup process is woken
by WALReceiver when WAL arrives, so if all is well, Startup will swing
back into action quickly when needed.

If standby_can_hibernate = true, then these processes can hibernate:
* startup
* walreceiver - hibernate, but only for wal_receiver_timeout/2, not
60s, so other existing behavior is preserved.

If subscription_can_hibernate = true, then these processes can hibernate:
* logical launcher
* logical worker - hibernate, but only for wal_receiver_timeout/2, not
60s, so other existing behavior is preserved.

Patch to do all of the above attached.

Comments please.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_to_reduce_power_consumption.v1.patchapplication/octet-stream; name=hibernate_to_reduce_power_consumption.v1.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc63172efd..45f021fe9a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4815,6 +4815,35 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-standby-can-hibernate" xreflabel="standby_can_hibernate">
+      <term><varname>standby_can_hibernate</varname> (<type>integer</type>)
+      <indexterm>
+        <primary><varname>standby_can_hibernate</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies whether standby workers will hibernate if there is no incoming
+        replication traffic and all changes have been applied. Hibernation will
+        reduce power consumption when idle, though workers will wake immediately when
+        new replication traffic arrives.
+       </para>
+       <para>
+        When hibernating, the startup process will take longer to notice that the
+        standby has been promoted, which could take up to 60s, rather than the
+        normal recheck cycle time of 5s. This will increase the failover time
+        in a high availability cluster configuration and also increase the
+        time required to restart the wal receiver should it be disconnected.
+       </para>
+       <para>
+        When hibernating, the wal receiver process will sleep for
+        wal_receiver_timeout / 2, rather than the normal cycle time of 100ms.
+        You may wish to increase the value of wal_receiver_timeout to
+        <literal>120s</literal> or more to improve hibernation.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
 
@@ -4884,6 +4913,32 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-subscription-can-hibernate" xreflabel="subscription_can_hibernate">
+      <term><varname>subscription_can_hibernate</varname> (<type>integer</type>)
+      <indexterm>
+        <primary><varname>subscription_can_hibernate</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies whether subscription processes will hibernate if there is no incoming
+        replication traffic and all changes have been applied. Hibernation will
+        reduce power consumption when idle, though workers will wake immediately when
+        new replication traffic arrives.
+       </para>
+       <para>
+        When hibernating, the logical launcher process will take longer to notice
+        that subscription workers have disconnected before it restarts them.
+       </para>
+       <para>
+        When hibernating, the logical worker process will sleep for
+        wal_receiver_timeout / 2, rather than the normal cycle time of 100ms.
+        You may wish to increase the value of wal_receiver_timeout to
+        <literal>120s</literal> or more to improve hibernation.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 958220c495..bd471504b7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12829,6 +12829,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 			case XLOG_FROM_STREAM:
 				{
 					bool		havedata;
+					DECLARE_HIBERNATE_VARS();
 
 					/*
 					 * We should be able to move to XLOG_FROM_STREAM only in
@@ -13013,14 +13014,18 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					}
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
+					 * Wait for more WAL to arrive. Time out allows startup
 					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * WAL receiver is still active. Hence, hibernation must
+					 * be explicitly approved.
 					 */
+					SET_DELAY_OR_HIBERNATE(havedata || !standby_can_hibernate,
+											wal_retrieve_retry_interval);
 					(void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |
 									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 cur_timeout,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 					break;
 				}
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 681ef91b81..942cc4a07f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -134,7 +134,6 @@ int			Log_autovacuum_min_duration = 600000;
 
 /* the minimum allowed time between two awakenings of the launcher */
 #define MIN_AUTOVAC_SLEEPTIME 100.0 /* milliseconds */
-#define MAX_AUTOVAC_SLEEPTIME 300	/* seconds */
 
 /* Flags to tell if we are in an autovacuum process */
 static bool am_autovacuum_launcher = false;
@@ -932,8 +931,8 @@ launcher_determine_sleep(bool canlaunch, bool recursing, struct timeval *nap)
 	 * infinite sleep in strange cases like the system clock going backwards a
 	 * few years.
 	 */
-	if (nap->tv_sec > MAX_AUTOVAC_SLEEPTIME)
-		nap->tv_sec = MAX_AUTOVAC_SLEEPTIME;
+	if (nap->tv_sec > HIBERNATE_DELAY_SEC)
+		nap->tv_sec = HIBERNATE_DELAY_SEC;
 }
 
 /*
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index d1f5d12eff..9566fd2b3d 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -60,12 +60,6 @@
  */
 int			BgWriterDelay = 200;
 
-/*
- * Multiplier to apply to BgWriterDelay when we decide to hibernate.
- * (Perhaps this needs to be configurable?)
- */
-#define HIBERNATE_FACTOR			50
-
 /*
  * Interval in which standby snapshots are logged into the WAL stream, in
  * milliseconds.
@@ -337,7 +331,7 @@ BackgroundWriterMain(void)
 			/* Sleep ... */
 			(void) WaitLatch(MyLatch,
 							 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-							 BgWriterDelay * HIBERNATE_FACTOR,
+							 HIBERNATE_DELAY_MS,
 							 WAIT_EVENT_BGWRITER_HIBERNATE);
 			/* Reset the notification request in case we timed out */
 			StrategyNotifyBgWriter(-1);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index d916ed39a8..156a4b6404 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -53,8 +53,6 @@
  * Timer definitions.
  * ----------
  */
-#define PGARCH_AUTOWAKE_INTERVAL 60 /* How often to force a poll of the
-									 * archive status directory; in seconds. */
 #define PGARCH_RESTART_INTERVAL 10	/* How often to attempt to restart a
 									 * failed archiver; in seconds. */
 
@@ -346,7 +344,7 @@ pgarch_MainLoop(void)
 
 		/*
 		 * Sleep until a signal is received, or until a poll is forced by
-		 * PGARCH_AUTOWAKE_INTERVAL having passed since last_copy_time, or
+		 * HIBERNATE_DELAY_SEC having passed since last_copy_time, or
 		 * until postmaster dies.
 		 */
 		if (!time_to_stop)		/* Don't wait during last iteration */
@@ -354,7 +352,7 @@ pgarch_MainLoop(void)
 			pg_time_t	curtime = (pg_time_t) time(NULL);
 			int			timeout;
 
-			timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time);
+			timeout = HIBERNATE_DELAY_SEC - (curtime - last_copy_time);
 			if (timeout > 0)
 			{
 				int			rc;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ce90877154..6de5cb3a9f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1636,7 +1636,7 @@ DetermineSleepTime(struct timeval *timeout)
 		}
 		else
 		{
-			timeout->tv_sec = 60;
+			timeout->tv_sec = HIBERNATE_DELAY_SEC;
 			timeout->tv_usec = 0;
 		}
 		return;
@@ -1694,15 +1694,15 @@ DetermineSleepTime(struct timeval *timeout)
 		timeout->tv_usec = microsecs;
 
 		/* Ensure we don't exceed one minute */
-		if (timeout->tv_sec > 60)
+		if (timeout->tv_sec > HIBERNATE_DELAY_SEC)
 		{
-			timeout->tv_sec = 60;
+			timeout->tv_sec = HIBERNATE_DELAY_SEC;
 			timeout->tv_usec = 0;
 		}
 	}
 	else
 	{
-		timeout->tv_sec = 60;
+		timeout->tv_sec = HIBERNATE_DELAY_SEC;
 		timeout->tv_usec = 0;
 	}
 }
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 102fa2a089..7f0b74d0de 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -70,13 +70,7 @@
 int			WalWriterDelay = 200;
 int			WalWriterFlushAfter = 128;
 
-/*
- * Number of do-nothing loops before lengthening the delay time, and the
- * multiplier to apply to WalWriterDelay when we do decide to hibernate.
- * (Perhaps these need to be configurable?)
- */
-#define LOOPS_UNTIL_HIBERNATE		50
-#define HIBERNATE_FACTOR			25
+DECLARE_HIBERNATE_VARS();
 
 /* Prototypes for private functions */
 static void HandleWalWriterInterrupts(void);
@@ -210,7 +204,7 @@ WalWriterMain(void)
 	/*
 	 * Reset hibernation state after any error.
 	 */
-	left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
+	RESET_TO_NON_HIBERNATE();
 	hibernating = false;
 	SetWalWriterSleeping(false);
 
@@ -225,8 +219,6 @@ WalWriterMain(void)
 	 */
 	for (;;)
 	{
-		long		cur_timeout;
-
 		/*
 		 * Advertise whether we might hibernate in this cycle.  We do this
 		 * before resetting the latch to ensure that any async commits will
@@ -252,24 +244,11 @@ WalWriterMain(void)
 		 * Do what we're here for; then, if XLogBackgroundFlush() found useful
 		 * work to do, reset hibernation counter.
 		 */
-		if (XLogBackgroundFlush())
-			left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
-		else if (left_till_hibernate > 0)
-			left_till_hibernate--;
+		SET_DELAY_OR_HIBERNATE(XLogBackgroundFlush(), WalWriterDelay);
 
 		/* Send WAL statistics to the stats collector */
 		pgstat_send_wal(false);
 
-		/*
-		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
-		 * haven't done anything useful for quite some time, lengthen the
-		 * sleep time so as to reduce the server's idle power consumption.
-		 */
-		if (left_till_hibernate > 0)
-			cur_timeout = WalWriterDelay;	/* in ms */
-		else
-			cur_timeout = WalWriterDelay * HIBERNATE_FACTOR;
-
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 						 cur_timeout,
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 7b473903a6..d89e084352 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -48,11 +48,9 @@
 #include "utils/snapmgr.h"
 #include "utils/timeout.h"
 
-/* max sleep time between cycles (3min) */
-#define DEFAULT_NAPTIME_PER_CYCLE 180000L
-
 int			max_logical_replication_workers = 4;
 int			max_sync_workers_per_subscription = 2;
+bool		subscription_can_hibernate = false;
 
 LogicalRepWorker *MyLogicalRepWorker = NULL;
 
@@ -802,6 +800,7 @@ void
 ApplyLauncherMain(Datum main_arg)
 {
 	TimestampTz last_start_time = 0;
+	DECLARE_HIBERNATE_VARS();
 
 	ereport(DEBUG1,
 			(errmsg_internal("logical replication launcher started")));
@@ -831,11 +830,12 @@ ApplyLauncherMain(Datum main_arg)
 		MemoryContext subctx;
 		MemoryContext oldctx;
 		TimestampTz now;
-		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		bool		work_done;
 
 		CHECK_FOR_INTERRUPTS();
 
 		now = GetCurrentTimestamp();
+		work_done = false;
 
 		/* Limit the start retry to once a wal_retrieve_retry_interval */
 		if (TimestampDifferenceExceeds(last_start_time, now,
@@ -866,7 +866,7 @@ ApplyLauncherMain(Datum main_arg)
 				if (w == NULL)
 				{
 					last_start_time = now;
-					wait_time = wal_retrieve_retry_interval;
+					work_done = true;
 
 					logicalrep_worker_launch(sub->dbid, sub->oid, sub->name,
 											 sub->owner, InvalidOid);
@@ -886,13 +886,16 @@ ApplyLauncherMain(Datum main_arg)
 			 * usually means crash of the worker, so we should retry in
 			 * wal_retrieve_retry_interval again.
 			 */
-			wait_time = wal_retrieve_retry_interval;
+			work_done = true;
 		}
 
+		SET_DELAY_OR_HIBERNATE(work_done || !subscription_can_hibernate,
+							   wal_retrieve_retry_interval);
+
 		/* Wait for more work. */
 		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-					   wait_time,
+					   cur_timeout,
 					   WAIT_EVENT_LOGICAL_LAUNCHER_MAIN);
 
 		if (rc & WL_LATCH_SET)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index d77bb32bb9..e8d91d541f 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -194,8 +194,6 @@
 #include "utils/syscache.h"
 #include "utils/timeout.h"
 
-#define NAPTIME_PER_CYCLE 1000	/* max sleep time between cycles (1s) */
-
 typedef struct FlushPosition
 {
 	dlist_node	node;
@@ -2566,6 +2564,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 	bool		ping_sent = false;
 	TimeLineID	tli;
 	ErrorContextCallback errcallback;
+	DECLARE_HIBERNATE_VARS();
 
 	/*
 	 * Init the ApplyMessageContext which we clean up after each replication
@@ -2602,7 +2601,6 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		int			len;
 		char	   *buf = NULL;
 		bool		endofstream = false;
-		long		wait_time;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2636,6 +2634,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 					/* Reset timeout. */
 					last_recv_timestamp = GetCurrentTimestamp();
 					ping_sent = false;
+					RESET_TO_NON_HIBERNATE();
 
 					/* Ensure we are reading the data into our memory context. */
 					MemoryContextSwitchTo(ApplyMessageContext);
@@ -2724,15 +2723,15 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		 * no particular urgency about waking up unless we get data or a
 		 * signal.
 		 */
-		if (!dlist_is_empty(&lsn_mapping))
-			wait_time = WalWriterDelay;
-		else
-			wait_time = NAPTIME_PER_CYCLE;
+		SET_DELAY_OR_HIBERNATE_OPT(!dlist_is_empty(&lsn_mapping) || !subscription_can_hibernate,
+									WalWriterDelay,
+									wal_receiver_timeout / 2);
 
 		rc = WaitLatchOrSocket(MyLatch,
 							   WL_SOCKET_READABLE | WL_LATCH_SET |
 							   WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-							   fd, wait_time,
+							   fd,
+							   cur_timeout,
 							   WAIT_EVENT_LOGICAL_APPLY_MAIN);
 
 		if (rc & WL_LATCH_SET)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index b39fce8c23..d028359684 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -89,12 +89,13 @@
 int			wal_receiver_status_interval;
 int			wal_receiver_timeout;
 bool		hot_standby_feedback;
+bool		standby_can_hibernate = false;
 
 /* libpqwalreceiver connection */
 static WalReceiverConn *wrconn = NULL;
 WalReceiverFunctionsType *WalReceiverFunctions = NULL;
 
-#define NAPTIME_PER_CYCLE 100	/* max sleep time between cycles (100ms) */
+#define WALRECEIVER_WAIT_PER_CYCLE 100	/* normal sleep time between cycles (100ms) */
 
 /*
  * These variables are used similarly to openLogFile/SegNo,
@@ -184,6 +185,7 @@ WalReceiverMain(void)
 	char	   *err;
 	char	   *sender_host = NULL;
 	int			sender_port = 0;
+	DECLARE_HIBERNATE_VARS();
 
 	/*
 	 * WalRcv should be set up already (if we are a backend, we inherit this
@@ -416,6 +418,7 @@ WalReceiverMain(void)
 			/* Initialize the last recv timestamp */
 			last_recv_timestamp = GetCurrentTimestamp();
 			ping_sent = false;
+			RESET_TO_NON_HIBERNATE();
 
 			/* Loop until end-of-streaming or error */
 			for (;;)
@@ -465,6 +468,7 @@ WalReceiverMain(void)
 							ping_sent = false;
 							XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1,
 												 startpointTLI);
+							RESET_TO_NON_HIBERNATE();
 						}
 						else if (len == 0)
 							break;
@@ -496,6 +500,11 @@ WalReceiverMain(void)
 				if (endofwal)
 					break;
 
+				/* If we get here, we're idle so hibernate, if allowed */
+				SET_DELAY_OR_HIBERNATE_OPT(false || !standby_can_hibernate,
+											WALRECEIVER_WAIT_PER_CYCLE,
+											wal_receiver_timeout / 2);
+
 				/*
 				 * Ideally we would reuse a WaitEventSet object repeatedly
 				 * here to avoid the overheads of WaitLatchOrSocket on epoll
@@ -512,7 +521,7 @@ WalReceiverMain(void)
 									   WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
 									   WL_TIMEOUT | WL_LATCH_SET,
 									   wait_fd,
-									   NAPTIME_PER_CYCLE,
+									   cur_timeout,
 									   WAIT_EVENT_WAL_RECEIVER_MAIN);
 				if (rc & WL_LATCH_SET)
 				{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f505413a7f..98ddcbd9bb 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1405,6 +1405,24 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"standby_can_hibernate", PGC_SIGHUP, REPLICATION_STANDBY,
+			gettext_noop("Allow standby workers to hibernate if idle for long periods."),
+			NULL
+		},
+		&standby_can_hibernate,
+		false,
+		NULL, NULL, NULL
+	},
+	{
+		{"subscription_can_hibernate", PGC_SIGHUP, REPLICATION_SUBSCRIBERS,
+			gettext_noop("Allow subscription procs to hibernate if idle for long periods."),
+			NULL
+		},
+		&subscription_can_hibernate,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"restart_after_crash", PGC_SIGHUP, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Reinitialize server after backend crash."),
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 92f73a55b8..f5eca294f5 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -28,6 +28,8 @@
 extern int	wal_receiver_status_interval;
 extern int	wal_receiver_timeout;
 extern bool hot_standby_feedback;
+extern bool standby_can_hibernate;
+extern bool subscription_can_hibernate;
 
 /*
  * MAXCONNINFO: maximum size of a connection string.
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 3aa7b33834..18a8522376 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -83,6 +83,30 @@
  * only, so using any latch other than the process latch effectively precludes
  * use of any generic handler.
  *
+ * Since not all servers have a 24/7 duty cycle, if a worker has no current
+ * work then it can be enhanced to eventually hibernate. The goal here is
+ * to reduce the CPU wakeups and thus reduce electrical power consumption,
+ * making a small contribution to reducing global warming. To support
+ * hibernation, we add some additional macros to judge when, and if, to
+ * switch from a normal loop wait to hibernate timeout (default 60s).
+ * Using these macros helps have a common design pattern for hibernation.
+ * 1. Add in DECLARE_HIBERNATE_VARS(); to add private vars to this module
+ * 2. Add SET_DELAY_OR_HIBERNATE() to eventually hibernate if idle
+ * 3. Use the variable "curr_timeout" in WaitLatch()
+ * Hibernation for this worker is independent of any other worker.
+ *
+ * #define	normal_wait_timeout = 100L;
+ * DECLARE_HIBERNATE_VARS();
+ * for (;;)
+ * {
+ *	   if (work to do)
+ *		   Do Stuff(); // in particular, exit loop if some condition satisfied
+ *     SET_DELAY_OR_HIBERNATE(work_to_do, normal_wait_timeout);
+ *	   WaitLatch(cur_timeout);
+ *	   ResetLatch();
+ * }
+ * A working example is shown in src/test/modules/worker_spi/worker_spi.c
+ * as well as in code for workers in src/backend/postmaster et al.
  *
  * WaitEventSets allow to wait for latches being set and additional events -
  * postmaster dying and socket readiness of several sockets currently - at the
@@ -139,6 +163,70 @@ typedef struct Latch
 							 WL_SOCKET_WRITEABLE | \
 							 WL_SOCKET_CONNECTED)
 
+/*
+ * Common design pattern for Hibernation.
+ *
+ * Sleep until we are signaled or normal_delay has elapsed.  If we
+ * haven't done any work for quite some time, lengthen the sleep
+ * time so as to reduce the server's idle power consumption.
+ *
+ * Avoids use static vars to allow each process to
+ * have its own private, independent counters.
+ */
+
+#define HIBERNATE_DELAY_SEC		60
+#define HIBERNATE_DELAY_MS		(1000L * HIBERNATE_DELAY_SEC)
+#define LOOPS_UNTIL_HIBERNATE	50
+
+#define	SET_DELAY_OR_HIBERNATE(work_done, normal_delay)	\
+	if (work_done)									\
+	{												\
+		hibernate_logged = false;					\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+	}												\
+	else if (left_till_hibernate > 0)				\
+		left_till_hibernate--;						\
+	if (left_till_hibernate > 0)					\
+		cur_timeout = normal_delay;					\
+	else											\
+	{												\
+		cur_timeout = HIBERNATE_DELAY_MS;			\
+		if (!hibernate_logged)						\
+			elog(DEBUG3, "hibernating for %ld",		\
+							cur_timeout);			\
+		hibernate_logged = true;					\
+	}
+
+#define	SET_DELAY_OR_HIBERNATE_OPT(work_done, normal_delay, hib_delay) \
+	if (work_done)									\
+	{												\
+		hibernate_logged = false;					\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+	}												\
+	else if (left_till_hibernate > 0)				\
+		left_till_hibernate--;						\
+	if (left_till_hibernate > 0)					\
+		cur_timeout = normal_delay;					\
+	else											\
+	{												\
+		cur_timeout = hib_delay;					\
+		if (!hibernate_logged)						\
+			elog(DEBUG3, "hibernating for %ld",		\
+							cur_timeout);			\
+		hibernate_logged = true;					\
+	}
+
+#define AM_HIBERNATING()	(left_till_hibernate == 0)
+
+#define DECLARE_HIBERNATE_VARS()					\
+bool	hibernate_logged = false;					\
+int		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+long	cur_timeout;
+
+#define RESET_TO_NON_HIBERNATE()					\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
+
+
 typedef struct WaitEvent
 {
 	int			pos;			/* position in the event data structure */
diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index 05ced63780..1d05634b86 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -46,6 +46,8 @@ PG_MODULE_MAGIC;
 
 PG_FUNCTION_INFO_V1(worker_spi_launch);
 
+DECLARE_HIBERNATE_VARS();
+
 void		_PG_init(void);
 void		worker_spi_main(Datum) pg_attribute_noreturn();
 
@@ -137,6 +139,7 @@ worker_spi_main(Datum main_arg)
 	worktable  *table;
 	StringInfoData buf;
 	char		name[20];
+	bool		work_done = true;
 
 	table = palloc(sizeof(worktable));
 	sprintf(name, "schema%d", index);
@@ -191,6 +194,13 @@ worker_spi_main(Datum main_arg)
 	{
 		int			ret;
 
+		/*
+		 * Use the standard design pattern for wait time/hibernation.
+		 * After 50 consecutive loops with work_done=true the wait time
+		 * will be set to the standard hibernation timeout of 60s.
+		 */
+		SET_DELAY_OR_HIBERNATE(work_done, worker_spi_naptime * 1000L);
+
 		/*
 		 * Background workers mustn't call usleep() or any direct equivalent:
 		 * instead, they may wait on their process latch, which sleeps as
@@ -199,7 +209,7 @@ worker_spi_main(Datum main_arg)
 		 */
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-						 worker_spi_naptime * 1000L,
+						 cur_timeout,
 						 PG_WAIT_EXTENSION);
 		ResetLatch(MyLatch);
 
@@ -256,7 +266,10 @@ worker_spi_main(Datum main_arg)
 				elog(LOG, "%s: count in %s.%s is now %d",
 					 MyBgworkerEntry->bgw_name,
 					 table->schema, table->name, val);
+			work_done = true;
 		}
+		else
+			work_done = false;
 
 		/*
 		 * And finish our transaction.
#2Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#1)
Re: Reducing power consumption on idle servers

Hi,

On 2022-02-19 14:10:39 +0000, Simon Riggs wrote:

Some years ago we did a pass through the various worker processes to
add hibernation as a mechanism to reduce power consumption on an idle
server. Replication never got the memo, so power consumption on an
idle server is not very effective on standby or logical subscribers.
The code and timing for hibernation is also different for each worker,
which is confusing.

Yea, I think it's high time that we fix this.

It's not even just power consumption:
- the short timeouts hide all kinds of bugs around missed wakeups / racy
wakeup sequences
- debugging problems gets harder if there's lots of frequent activity

CURRENT STATE

These servers naturally sleep for long periods when inactive:
* postmaster - 60s
* checkpointer - checkpoint_timeout
* syslogger - log_rotation_age
* pgarch - 60s

Why do we need *any* timeout here? It seems one of those bug/race hiding
things? Imo we should only use a timeout when a prior archive failed and rely
on wakeups otherwise.

* autovac launcher - autovacuum_naptime

On production systems autovacuum_naptime often can't be a large value,
otherwise it's easy to not keep up on small busy tables. That's fine for
actually busy servers, but with the increase in hosted PG offerings, the
defaults in those offerings needs to cater to a broader audience.

These servers don't try to hibernate at all:
* logical worker - 1s

Not great.

* logical launcher - wal_retrieve_retry_interval (undocumented)

I think it's actually 180s in the happy path. The wal_retrieve_retry_interval
is about how often workers get restarted. But if there's no need to restart,
it sleeps longer.

* startup - hardcoded 5s when streaming, wal_receiver_retry_interval
for WAL files

* wal_receiver - 100ms, currently gets woken when WAL arrives

This is a fairly insane one. We should compute a precise timeout based on
wal_receiver_timeout.

And it's not just one syscall every 100ms, it's

recvfrom(4, 0x7fd66134b960, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
epoll_create1(EPOLL_CLOEXEC) = 6
epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593560, u64=140558730322456}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593584, u64=140558730322480}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593608, u64=140558730322504}}) = 0
epoll_wait(6, [], 1, 100) = 0
close(6) = 0

PROPOSED CHANGES

1. Standardize the hibernation time at 60s, using a #define
HIBERNATE_DELAY_SEC 60

I don't think the hibernation stuff is a great pattern. When hibernation was
introduced we neither had latches nor condition variables, so we couldn't
easily do better. Today we have, so we should do better.

We should either not use timeouts at all (e.g. pgarch without a preceding
failure) and rely on being woken up when new work arrives.

Or use precisely calculated timeouts (e.g. NOW() - last_recv_timestamp +
wal_receiver_timeout) when there's a good reason to wake up (like needing to
send a status message).

IMO we should actually consider moving *away* from hibernation for the cases
where we currently use it and rely much more heavily on being notified when
there's work to do, without a timeout when not.

5. Startup process has a hardcoded 5s loop because it checks for
trigger file to promote it. So hibernating would mean that it would
promote more slowly, and/or restart failing walreceiver more slowly,
so this requires user approval, and hence add new GUCs to approve that
choice. This is a valid choice because a long-term idle server is
obviously not in current use, so waiting 60s for failover or restart
is very unlikely to cause significant issue.

There's plenty of databases that are close to read-only but business critical,
so I don't buy that argument.

IMO we should instead consider either deprecating file based promotion, or
adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc)
that avoid the need to poll for file creation.

Greetings,

Andres Freund

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#2)
Re: Reducing power consumption on idle servers

On Sun, Feb 20, 2022 at 6:03 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-02-19 14:10:39 +0000, Simon Riggs wrote:

* wal_receiver - 100ms, currently gets woken when WAL arrives

This is a fairly insane one. We should compute a precise timeout based on
wal_receiver_timeout.

I proposed a patch to do that here: https://commitfest.postgresql.org/37/3520/

It needs a couple more tweaks (I realised it needs to round
microsecond sleep times up to the nearest whole millisecond,
explaining a few spurious wakeups, and Horiguchi-san had some more
feedback for me that I haven't got to yet), but it seems close.

And it's not just one syscall every 100ms, it's

recvfrom(4, 0x7fd66134b960, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
epoll_create1(EPOLL_CLOEXEC) = 6
epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593560, u64=140558730322456}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593584, u64=140558730322480}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=1630593608, u64=140558730322504}}) = 0
epoll_wait(6, [], 1, 100) = 0
close(6) = 0

Yeah, I have a patch for that (no CF entry yet, will create shortly),
and together these two patches get walreceiver's wait loop down to a
single epoll_wait() call (or local equivalent) that waits the exact
amount of time required to perform the next periodic action.

5. Startup process has a hardcoded 5s loop because it checks for
trigger file to promote it. So hibernating would mean that it would
promote more slowly, and/or restart failing walreceiver more slowly,
so this requires user approval, and hence add new GUCs to approve that
choice. This is a valid choice because a long-term idle server is
obviously not in current use, so waiting 60s for failover or restart
is very unlikely to cause significant issue.

There's plenty of databases that are close to read-only but business critical,
so I don't buy that argument.

IMO we should instead consider either deprecating file based promotion, or
adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc)
that avoid the need to poll for file creation.

Yeah, I pondered inotify/KQ_FILTER_VNODE/FindFirstChangeNotification
for this while working on 600f2f50. I realised I could probably teach
a WaitEventSet to wake up when a file is added on most OSes, but I
didn't try to prototype it... I agree that the whole file
system-watching concept feels pretty clunky, so if just getting rid of
it is an option...

It would be nice to tame the walwriter's wakeups...

#4Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Andres Freund (#2)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Sat, 19 Feb 2022 at 17:03, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-02-19 14:10:39 +0000, Simon Riggs wrote:

Some years ago we did a pass through the various worker processes to
add hibernation as a mechanism to reduce power consumption on an idle
server. Replication never got the memo, so power consumption on an
idle server is not very effective on standby or logical subscribers.
The code and timing for hibernation is also different for each worker,
which is confusing.

Yea, I think it's high time that we fix this.

Good to have your support. There are millions of servers running idle
for some part of their duty cycle.

This patch seeks to change the situation for the better in PG15, i.e.
soon, so the changes proposed are deliberately light. It also seeks to
provide a framework that writers of background worker processes can
follow, since we can't just fix core, we need to fix all the various
bgworkers in use as well.

IMO we should instead consider either deprecating file based promotion, or
adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc)
that avoid the need to poll for file creation.

Deprecating explicit file-based promotion is possible and simple, so
that is the approach in the latest version of the patch.

Thanks for the suggestion.

I've re-analyzed all of the code around startup/walreceiver
interactions and there isn't any need for 5s delay on startup process,
IMHO, so we can sleep longer, for startup process.

IMO we should actually consider moving *away* from hibernation for the cases
where we currently use it and rely much more heavily on being notified when
there's work to do, without a timeout when not.

I don't think that is a practical approach this close to end of PG15.
This would require changing behavior of bgwriter, walwriter, pgarch
when they are not broken. The likelihood that we would break something
is too high.

What is an issue is that the sleep times of various procs are on
completely different cycles, which is why I am proposing normalizing
them so that Postgres can actually sleep effectively.

* autovac launcher - autovacuum_naptime

On production systems autovacuum_naptime often can't be a large value,
otherwise it's easy to not keep up on small busy tables. That's fine for
actually busy servers, but with the increase in hosted PG offerings, the
defaults in those offerings needs to cater to a broader audience.

Autovac varies its wakeup cycle according to how much work is done. It
is OK to set autovacuum_naptime without affecting power consumption
when idle.

Idle for autovac is defined slightly differently, since if all user
work completes then there may still be a lot of vacuuming to do before
it goes fully idle. But my observation is that there are many servers
that go idle for more than 50% of each week, when operating 8-12 hours
per day, 5 days per week, so we can still save a lot of power.

This patch doesn't change how autovac works, it just uses a common
setting for the hibernation that eventually occurs.

These servers don't try to hibernate at all:
* logical worker - 1s

Not great.

Agreed, the patch improves this, roughly same as walreceiver.

* logical launcher - wal_retrieve_retry_interval (undocumented)

I think it's actually 180s in the happy path. The wal_retrieve_retry_interval
is about how often workers get restarted. But if there's no need to restart,
it sleeps longer.

I propose normalizing all of the various hibernation times to the same value.

* wal_receiver - 100ms, currently gets woken when WAL arrives

This is a fairly insane one. We should compute a precise timeout based on
wal_receiver_timeout.

That is exactly what the patch does, when it hibernates.

I wasn't aware of Thomas' work, but now that I am we can choose which
of those approaches to use for WAL receiver. I hope that we can fix
logical worker and wal receiver to use the same algorithm. The rest of
this patch would still be valid, whatever we do for those two procs.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_to_reduce_power_consumption.v2.patchapplication/octet-stream; name=hibernate_to_reduce_power_consumption.v2.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc63172efd..bd5835df9a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4517,9 +4517,12 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         <listitem>
          <para>
           Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
+          standby, though this may take as long as 300s to take effect.
+          This mechanism is now deprecated and will be removed in a later
+          release. The preferred mechanism by which to promote the standby 
+          is by using <command>pg_ctl promote</command> or calling
+          <function>pg_promote()</function>, both of which promote much
+          faster than using this parameter.
           This parameter can only be set in the <filename>postgresql.conf</filename>
           file or on the server command line.
          </para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 958220c495..cc8021a454 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -13013,14 +13013,17 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					}
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver.  Hibernate, but time out
+					 * to react to a trigger file.  Direct use of trigger file
+					 * is now deprecated and the promote_trigger_file will be
+					 * removed in a later use.
 					 */
 					(void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |
 									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 HIBERNATE_DELAY_MS,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 					break;
 				}
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 681ef91b81..942cc4a07f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -134,7 +134,6 @@ int			Log_autovacuum_min_duration = 600000;
 
 /* the minimum allowed time between two awakenings of the launcher */
 #define MIN_AUTOVAC_SLEEPTIME 100.0 /* milliseconds */
-#define MAX_AUTOVAC_SLEEPTIME 300	/* seconds */
 
 /* Flags to tell if we are in an autovacuum process */
 static bool am_autovacuum_launcher = false;
@@ -932,8 +931,8 @@ launcher_determine_sleep(bool canlaunch, bool recursing, struct timeval *nap)
 	 * infinite sleep in strange cases like the system clock going backwards a
 	 * few years.
 	 */
-	if (nap->tv_sec > MAX_AUTOVAC_SLEEPTIME)
-		nap->tv_sec = MAX_AUTOVAC_SLEEPTIME;
+	if (nap->tv_sec > HIBERNATE_DELAY_SEC)
+		nap->tv_sec = HIBERNATE_DELAY_SEC;
 }
 
 /*
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index d1f5d12eff..9566fd2b3d 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -60,12 +60,6 @@
  */
 int			BgWriterDelay = 200;
 
-/*
- * Multiplier to apply to BgWriterDelay when we decide to hibernate.
- * (Perhaps this needs to be configurable?)
- */
-#define HIBERNATE_FACTOR			50
-
 /*
  * Interval in which standby snapshots are logged into the WAL stream, in
  * milliseconds.
@@ -337,7 +331,7 @@ BackgroundWriterMain(void)
 			/* Sleep ... */
 			(void) WaitLatch(MyLatch,
 							 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-							 BgWriterDelay * HIBERNATE_FACTOR,
+							 HIBERNATE_DELAY_MS,
 							 WAIT_EVENT_BGWRITER_HIBERNATE);
 			/* Reset the notification request in case we timed out */
 			StrategyNotifyBgWriter(-1);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index d916ed39a8..156a4b6404 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -53,8 +53,6 @@
  * Timer definitions.
  * ----------
  */
-#define PGARCH_AUTOWAKE_INTERVAL 60 /* How often to force a poll of the
-									 * archive status directory; in seconds. */
 #define PGARCH_RESTART_INTERVAL 10	/* How often to attempt to restart a
 									 * failed archiver; in seconds. */
 
@@ -346,7 +344,7 @@ pgarch_MainLoop(void)
 
 		/*
 		 * Sleep until a signal is received, or until a poll is forced by
-		 * PGARCH_AUTOWAKE_INTERVAL having passed since last_copy_time, or
+		 * HIBERNATE_DELAY_SEC having passed since last_copy_time, or
 		 * until postmaster dies.
 		 */
 		if (!time_to_stop)		/* Don't wait during last iteration */
@@ -354,7 +352,7 @@ pgarch_MainLoop(void)
 			pg_time_t	curtime = (pg_time_t) time(NULL);
 			int			timeout;
 
-			timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time);
+			timeout = HIBERNATE_DELAY_SEC - (curtime - last_copy_time);
 			if (timeout > 0)
 			{
 				int			rc;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ce90877154..6de5cb3a9f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1636,7 +1636,7 @@ DetermineSleepTime(struct timeval *timeout)
 		}
 		else
 		{
-			timeout->tv_sec = 60;
+			timeout->tv_sec = HIBERNATE_DELAY_SEC;
 			timeout->tv_usec = 0;
 		}
 		return;
@@ -1694,15 +1694,15 @@ DetermineSleepTime(struct timeval *timeout)
 		timeout->tv_usec = microsecs;
 
 		/* Ensure we don't exceed one minute */
-		if (timeout->tv_sec > 60)
+		if (timeout->tv_sec > HIBERNATE_DELAY_SEC)
 		{
-			timeout->tv_sec = 60;
+			timeout->tv_sec = HIBERNATE_DELAY_SEC;
 			timeout->tv_usec = 0;
 		}
 	}
 	else
 	{
-		timeout->tv_sec = 60;
+		timeout->tv_sec = HIBERNATE_DELAY_SEC;
 		timeout->tv_usec = 0;
 	}
 }
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 102fa2a089..7f0b74d0de 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -70,13 +70,7 @@
 int			WalWriterDelay = 200;
 int			WalWriterFlushAfter = 128;
 
-/*
- * Number of do-nothing loops before lengthening the delay time, and the
- * multiplier to apply to WalWriterDelay when we do decide to hibernate.
- * (Perhaps these need to be configurable?)
- */
-#define LOOPS_UNTIL_HIBERNATE		50
-#define HIBERNATE_FACTOR			25
+DECLARE_HIBERNATE_VARS();
 
 /* Prototypes for private functions */
 static void HandleWalWriterInterrupts(void);
@@ -210,7 +204,7 @@ WalWriterMain(void)
 	/*
 	 * Reset hibernation state after any error.
 	 */
-	left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
+	RESET_TO_NON_HIBERNATE();
 	hibernating = false;
 	SetWalWriterSleeping(false);
 
@@ -225,8 +219,6 @@ WalWriterMain(void)
 	 */
 	for (;;)
 	{
-		long		cur_timeout;
-
 		/*
 		 * Advertise whether we might hibernate in this cycle.  We do this
 		 * before resetting the latch to ensure that any async commits will
@@ -252,24 +244,11 @@ WalWriterMain(void)
 		 * Do what we're here for; then, if XLogBackgroundFlush() found useful
 		 * work to do, reset hibernation counter.
 		 */
-		if (XLogBackgroundFlush())
-			left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
-		else if (left_till_hibernate > 0)
-			left_till_hibernate--;
+		SET_DELAY_OR_HIBERNATE(XLogBackgroundFlush(), WalWriterDelay);
 
 		/* Send WAL statistics to the stats collector */
 		pgstat_send_wal(false);
 
-		/*
-		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
-		 * haven't done anything useful for quite some time, lengthen the
-		 * sleep time so as to reduce the server's idle power consumption.
-		 */
-		if (left_till_hibernate > 0)
-			cur_timeout = WalWriterDelay;	/* in ms */
-		else
-			cur_timeout = WalWriterDelay * HIBERNATE_FACTOR;
-
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 						 cur_timeout,
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 7b473903a6..2800f57d77 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -48,9 +48,6 @@
 #include "utils/snapmgr.h"
 #include "utils/timeout.h"
 
-/* max sleep time between cycles (3min) */
-#define DEFAULT_NAPTIME_PER_CYCLE 180000L
-
 int			max_logical_replication_workers = 4;
 int			max_sync_workers_per_subscription = 2;
 
@@ -802,6 +799,7 @@ void
 ApplyLauncherMain(Datum main_arg)
 {
 	TimestampTz last_start_time = 0;
+	DECLARE_HIBERNATE_VARS();
 
 	ereport(DEBUG1,
 			(errmsg_internal("logical replication launcher started")));
@@ -831,11 +829,12 @@ ApplyLauncherMain(Datum main_arg)
 		MemoryContext subctx;
 		MemoryContext oldctx;
 		TimestampTz now;
-		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		bool		work_done;
 
 		CHECK_FOR_INTERRUPTS();
 
 		now = GetCurrentTimestamp();
+		work_done = false;
 
 		/* Limit the start retry to once a wal_retrieve_retry_interval */
 		if (TimestampDifferenceExceeds(last_start_time, now,
@@ -866,7 +865,7 @@ ApplyLauncherMain(Datum main_arg)
 				if (w == NULL)
 				{
 					last_start_time = now;
-					wait_time = wal_retrieve_retry_interval;
+					work_done = true;
 
 					logicalrep_worker_launch(sub->dbid, sub->oid, sub->name,
 											 sub->owner, InvalidOid);
@@ -886,13 +885,16 @@ ApplyLauncherMain(Datum main_arg)
 			 * usually means crash of the worker, so we should retry in
 			 * wal_retrieve_retry_interval again.
 			 */
-			wait_time = wal_retrieve_retry_interval;
+			work_done = true;
 		}
 
+		SET_DELAY_OR_HIBERNATE(work_done,
+							   wal_retrieve_retry_interval);
+
 		/* Wait for more work. */
 		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-					   wait_time,
+					   cur_timeout,
 					   WAIT_EVENT_LOGICAL_LAUNCHER_MAIN);
 
 		if (rc & WL_LATCH_SET)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index d77bb32bb9..6d70326907 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -194,8 +194,6 @@
 #include "utils/syscache.h"
 #include "utils/timeout.h"
 
-#define NAPTIME_PER_CYCLE 1000	/* max sleep time between cycles (1s) */
-
 typedef struct FlushPosition
 {
 	dlist_node	node;
@@ -2566,6 +2564,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 	bool		ping_sent = false;
 	TimeLineID	tli;
 	ErrorContextCallback errcallback;
+	DECLARE_HIBERNATE_VARS();
 
 	/*
 	 * Init the ApplyMessageContext which we clean up after each replication
@@ -2602,7 +2601,6 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		int			len;
 		char	   *buf = NULL;
 		bool		endofstream = false;
-		long		wait_time;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2636,6 +2634,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 					/* Reset timeout. */
 					last_recv_timestamp = GetCurrentTimestamp();
 					ping_sent = false;
+					RESET_TO_NON_HIBERNATE();
 
 					/* Ensure we are reading the data into our memory context. */
 					MemoryContextSwitchTo(ApplyMessageContext);
@@ -2724,15 +2723,15 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		 * no particular urgency about waking up unless we get data or a
 		 * signal.
 		 */
-		if (!dlist_is_empty(&lsn_mapping))
-			wait_time = WalWriterDelay;
-		else
-			wait_time = NAPTIME_PER_CYCLE;
+		SET_DELAY_OR_HIBERNATE_OPT(!dlist_is_empty(&lsn_mapping),
+									WalWriterDelay,
+									wal_receiver_timeout / 2);
 
 		rc = WaitLatchOrSocket(MyLatch,
 							   WL_SOCKET_READABLE | WL_LATCH_SET |
 							   WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-							   fd, wait_time,
+							   fd,
+							   cur_timeout,
 							   WAIT_EVENT_LOGICAL_APPLY_MAIN);
 
 		if (rc & WL_LATCH_SET)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index b39fce8c23..23ed7f9ce9 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -94,7 +94,7 @@ bool		hot_standby_feedback;
 static WalReceiverConn *wrconn = NULL;
 WalReceiverFunctionsType *WalReceiverFunctions = NULL;
 
-#define NAPTIME_PER_CYCLE 100	/* max sleep time between cycles (100ms) */
+#define WALRECEIVER_WAIT_PER_CYCLE 100L	/* normal sleep time between cycles (100ms) */
 
 /*
  * These variables are used similarly to openLogFile/SegNo,
@@ -512,7 +512,7 @@ WalReceiverMain(void)
 									   WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
 									   WL_TIMEOUT | WL_LATCH_SET,
 									   wait_fd,
-									   NAPTIME_PER_CYCLE,
+									   wal_receiver_timeout / 2,
 									   WAIT_EVENT_WAL_RECEIVER_MAIN);
 				if (rc & WL_LATCH_SET)
 				{
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index c50728ea22..3903d25c2f 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -32,12 +32,6 @@
 
 WalRcvData *WalRcv = NULL;
 
-/*
- * How long to wait for walreceiver to start up after requesting
- * postmaster to launch it. In seconds.
- */
-#define WALRCV_STARTUP_TIMEOUT 10
-
 /* Report shared memory space needed by WalRcvShmemInit */
 Size
 WalRcvShmemSize(void)
@@ -95,7 +89,7 @@ WalRcvRunning(void)
 	{
 		pg_time_t	now = (pg_time_t) time(NULL);
 
-		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
+		if ((now - startTime) > wal_receiver_timeout)
 		{
 			bool		stopped = false;
 
@@ -146,7 +140,7 @@ WalRcvStreaming(void)
 	{
 		pg_time_t	now = (pg_time_t) time(NULL);
 
-		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
+		if ((now - startTime) > wal_receiver_timeout)
 		{
 			bool		stopped = false;
 
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 3aa7b33834..2b207f2807 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -83,6 +83,30 @@
  * only, so using any latch other than the process latch effectively precludes
  * use of any generic handler.
  *
+ * Since not all servers have a 24/7 duty cycle, if a worker has no current
+ * work then it can be enhanced to eventually hibernate. The goal here is
+ * to reduce the CPU wakeups and thus reduce electrical power consumption,
+ * making a small contribution to reducing global warming. To support
+ * hibernation, we add some additional macros to judge when, and if, to
+ * switch from a normal loop wait to hibernate timeout.
+ * Using these macros helps have a common design pattern for hibernation.
+ * 1. Add in DECLARE_HIBERNATE_VARS(); to add private vars to this module
+ * 2. Add SET_DELAY_OR_HIBERNATE() to eventually hibernate if idle
+ * 3. Use the variable "curr_timeout" in WaitLatch()
+ * Hibernation for this worker is independent of any other worker.
+ *
+ * #define	normal_wait_timeout_ms = 100L;
+ * DECLARE_HIBERNATE_VARS();
+ * for (;;)
+ * {
+ *	   if (work to do)
+ *		   Do Stuff(); // in particular, exit loop if some condition satisfied
+ *     SET_DELAY_OR_HIBERNATE(work_to_do, normal_wait_timeout_ms);
+ *	   WaitLatch(cur_timeout);
+ *	   ResetLatch();
+ * }
+ * A working example is shown in src/test/modules/worker_spi/worker_spi.c
+ * as well as in code for workers in src/backend/postmaster et al.
  *
  * WaitEventSets allow to wait for latches being set and additional events -
  * postmaster dying and socket readiness of several sockets currently - at the
@@ -139,6 +163,51 @@ typedef struct Latch
 							 WL_SOCKET_WRITEABLE | \
 							 WL_SOCKET_CONNECTED)
 
+/*
+ * Common design pattern for Hibernation.
+ *
+ * Sleep until we are signaled or normal_delay has elapsed.  If we
+ * haven't done any work for quite some time, lengthen the sleep
+ * time so as to reduce the server's idle power consumption.
+ *
+ * Avoids use static vars to allow each process to
+ * have its own private, independent counters.
+ */
+
+#define HIBERNATE_DELAY_SEC		300
+#define HIBERNATE_DELAY_MS		(1000L * HIBERNATE_DELAY_SEC)
+#define LOOPS_UNTIL_HIBERNATE	50
+
+#define	SET_DELAY_OR_HIBERNATE(work_done, normal_delay)	\
+	if (work_done)									\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+	else if (left_till_hibernate > 0)				\
+		left_till_hibernate--;						\
+	if (left_till_hibernate > 0)					\
+		cur_timeout = normal_delay;					\
+	else											\
+		cur_timeout = HIBERNATE_DELAY_MS;
+
+#define	SET_DELAY_OR_HIBERNATE_OPT(work_done, normal_delay, hib_delay) \
+	if (work_done)									\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+	else if (left_till_hibernate > 0)				\
+		left_till_hibernate--;						\
+	if (left_till_hibernate > 0)					\
+		cur_timeout = normal_delay;					\
+	else											\
+		cur_timeout = hib_delay;
+
+#define AM_HIBERNATING()	(left_till_hibernate == 0)
+
+#define DECLARE_HIBERNATE_VARS()					\
+int		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+long	cur_timeout;
+
+#define RESET_TO_NON_HIBERNATE()					\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
+
+
 typedef struct WaitEvent
 {
 	int			pos;			/* position in the event data structure */
diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index 05ced63780..1d05634b86 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -46,6 +46,8 @@ PG_MODULE_MAGIC;
 
 PG_FUNCTION_INFO_V1(worker_spi_launch);
 
+DECLARE_HIBERNATE_VARS();
+
 void		_PG_init(void);
 void		worker_spi_main(Datum) pg_attribute_noreturn();
 
@@ -137,6 +139,7 @@ worker_spi_main(Datum main_arg)
 	worktable  *table;
 	StringInfoData buf;
 	char		name[20];
+	bool		work_done = true;
 
 	table = palloc(sizeof(worktable));
 	sprintf(name, "schema%d", index);
@@ -191,6 +194,13 @@ worker_spi_main(Datum main_arg)
 	{
 		int			ret;
 
+		/*
+		 * Use the standard design pattern for wait time/hibernation.
+		 * After 50 consecutive loops with work_done=true the wait time
+		 * will be set to the standard hibernation timeout of 60s.
+		 */
+		SET_DELAY_OR_HIBERNATE(work_done, worker_spi_naptime * 1000L);
+
 		/*
 		 * Background workers mustn't call usleep() or any direct equivalent:
 		 * instead, they may wait on their process latch, which sleeps as
@@ -199,7 +209,7 @@ worker_spi_main(Datum main_arg)
 		 */
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-						 worker_spi_naptime * 1000L,
+						 cur_timeout,
 						 PG_WAIT_EXTENSION);
 		ResetLatch(MyLatch);
 
@@ -256,7 +266,10 @@ worker_spi_main(Datum main_arg)
 				elog(LOG, "%s: count in %s.%s is now %d",
 					 MyBgworkerEntry->bgw_name,
 					 table->schema, table->name, val);
+			work_done = true;
 		}
+		else
+			work_done = false;
 
 		/*
 		 * And finish our transaction.
#5Chapman Flack
chap@anastigmatix.net
In reply to: Simon Riggs (#4)
Re: Reducing power consumption on idle servers

Hi,

On 02/21/22 11:11, Simon Riggs wrote:

This patch seeks to change the situation for the better in PG15, i.e.
soon, so the changes proposed are deliberately light. It also seeks to
provide a framework that writers of background worker processes can
follow, since we can't just fix core, we need to fix all the various
bgworkers in use as well.

I think there might be a typo in the worker_spi.c example:

+	/*
+	 * Use the standard design pattern for wait time/hibernation.
+	 * After 50 consecutive loops with work_done=true the wait time
+	 * will be set to the standard hibernation timeout of 60s.
+	 */
+	SET_DELAY_OR_HIBERNATE(work_done, worker_spi_naptime * 1000L);

Shouldn't the comment be "with work_done=false" ?

Regards,
-Chap

#6Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Chapman Flack (#5)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Mon, 21 Feb 2022 at 16:49, Chapman Flack <chap@anastigmatix.net> wrote:

Shouldn't the comment be "with work_done=false" ?

Good catch, thanks.

I've also added docs to say that "promote_trigger_file" is now
deprecated. There were no tests for that functionality, so just as
well it is being removed.

v3 attached.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_to_reduce_power_consumption.v3.patchapplication/octet-stream; name=hibernate_to_reduce_power_consumption.v3.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d99bf38e67..2a7bcece49 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4516,9 +4516,12 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         <listitem>
          <para>
           Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
+          standby, though this may take as long as 300s to take effect.
+          This mechanism is now deprecated and will be removed in a later
+          release. The preferred mechanism by which to promote the standby 
+          is by using <command>pg_ctl promote</command> or calling
+          <function>pg_promote()</function>, both of which promote much
+          faster than using this parameter.
           This parameter can only be set in the <filename>postgresql.conf</filename>
           file or on the server command line.
          </para>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b5b6042104..7c41cc4c58 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  Use of
+    <varname>promote_trigger_file</varname> is deprecated. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,12 +1483,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    Use of <varname>promote_trigger_file</varname> is deprecated. If you're
     setting up the reporting servers that are only used to offload read-only
     queries from the primary, not for high availability purposes, you don't
     need to promote it.
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index f9f212680b..7b713c6b21 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3709,14 +3709,17 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					}
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver.  Hibernate, but time out
+					 * to react to a trigger file.  Direct use of trigger file
+					 * is now deprecated and the promote_trigger_file will be
+					 * removed in a later use.
 					 */
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |
 									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 HIBERNATE_DELAY_MS,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 681ef91b81..942cc4a07f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -134,7 +134,6 @@ int			Log_autovacuum_min_duration = 600000;
 
 /* the minimum allowed time between two awakenings of the launcher */
 #define MIN_AUTOVAC_SLEEPTIME 100.0 /* milliseconds */
-#define MAX_AUTOVAC_SLEEPTIME 300	/* seconds */
 
 /* Flags to tell if we are in an autovacuum process */
 static bool am_autovacuum_launcher = false;
@@ -932,8 +931,8 @@ launcher_determine_sleep(bool canlaunch, bool recursing, struct timeval *nap)
 	 * infinite sleep in strange cases like the system clock going backwards a
 	 * few years.
 	 */
-	if (nap->tv_sec > MAX_AUTOVAC_SLEEPTIME)
-		nap->tv_sec = MAX_AUTOVAC_SLEEPTIME;
+	if (nap->tv_sec > HIBERNATE_DELAY_SEC)
+		nap->tv_sec = HIBERNATE_DELAY_SEC;
 }
 
 /*
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index d1f5d12eff..9566fd2b3d 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -60,12 +60,6 @@
  */
 int			BgWriterDelay = 200;
 
-/*
- * Multiplier to apply to BgWriterDelay when we decide to hibernate.
- * (Perhaps this needs to be configurable?)
- */
-#define HIBERNATE_FACTOR			50
-
 /*
  * Interval in which standby snapshots are logged into the WAL stream, in
  * milliseconds.
@@ -337,7 +331,7 @@ BackgroundWriterMain(void)
 			/* Sleep ... */
 			(void) WaitLatch(MyLatch,
 							 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-							 BgWriterDelay * HIBERNATE_FACTOR,
+							 HIBERNATE_DELAY_MS,
 							 WAIT_EVENT_BGWRITER_HIBERNATE);
 			/* Reset the notification request in case we timed out */
 			StrategyNotifyBgWriter(-1);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index d916ed39a8..156a4b6404 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -53,8 +53,6 @@
  * Timer definitions.
  * ----------
  */
-#define PGARCH_AUTOWAKE_INTERVAL 60 /* How often to force a poll of the
-									 * archive status directory; in seconds. */
 #define PGARCH_RESTART_INTERVAL 10	/* How often to attempt to restart a
 									 * failed archiver; in seconds. */
 
@@ -346,7 +344,7 @@ pgarch_MainLoop(void)
 
 		/*
 		 * Sleep until a signal is received, or until a poll is forced by
-		 * PGARCH_AUTOWAKE_INTERVAL having passed since last_copy_time, or
+		 * HIBERNATE_DELAY_SEC having passed since last_copy_time, or
 		 * until postmaster dies.
 		 */
 		if (!time_to_stop)		/* Don't wait during last iteration */
@@ -354,7 +352,7 @@ pgarch_MainLoop(void)
 			pg_time_t	curtime = (pg_time_t) time(NULL);
 			int			timeout;
 
-			timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time);
+			timeout = HIBERNATE_DELAY_SEC - (curtime - last_copy_time);
 			if (timeout > 0)
 			{
 				int			rc;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 80bb269599..e998896917 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1637,7 +1637,7 @@ DetermineSleepTime(struct timeval *timeout)
 		}
 		else
 		{
-			timeout->tv_sec = 60;
+			timeout->tv_sec = HIBERNATE_DELAY_SEC;
 			timeout->tv_usec = 0;
 		}
 		return;
@@ -1695,15 +1695,15 @@ DetermineSleepTime(struct timeval *timeout)
 		timeout->tv_usec = microsecs;
 
 		/* Ensure we don't exceed one minute */
-		if (timeout->tv_sec > 60)
+		if (timeout->tv_sec > HIBERNATE_DELAY_SEC)
 		{
-			timeout->tv_sec = 60;
+			timeout->tv_sec = HIBERNATE_DELAY_SEC;
 			timeout->tv_usec = 0;
 		}
 	}
 	else
 	{
-		timeout->tv_sec = 60;
+		timeout->tv_sec = HIBERNATE_DELAY_SEC;
 		timeout->tv_usec = 0;
 	}
 }
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 102fa2a089..7f0b74d0de 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -70,13 +70,7 @@
 int			WalWriterDelay = 200;
 int			WalWriterFlushAfter = 128;
 
-/*
- * Number of do-nothing loops before lengthening the delay time, and the
- * multiplier to apply to WalWriterDelay when we do decide to hibernate.
- * (Perhaps these need to be configurable?)
- */
-#define LOOPS_UNTIL_HIBERNATE		50
-#define HIBERNATE_FACTOR			25
+DECLARE_HIBERNATE_VARS();
 
 /* Prototypes for private functions */
 static void HandleWalWriterInterrupts(void);
@@ -210,7 +204,7 @@ WalWriterMain(void)
 	/*
 	 * Reset hibernation state after any error.
 	 */
-	left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
+	RESET_TO_NON_HIBERNATE();
 	hibernating = false;
 	SetWalWriterSleeping(false);
 
@@ -225,8 +219,6 @@ WalWriterMain(void)
 	 */
 	for (;;)
 	{
-		long		cur_timeout;
-
 		/*
 		 * Advertise whether we might hibernate in this cycle.  We do this
 		 * before resetting the latch to ensure that any async commits will
@@ -252,24 +244,11 @@ WalWriterMain(void)
 		 * Do what we're here for; then, if XLogBackgroundFlush() found useful
 		 * work to do, reset hibernation counter.
 		 */
-		if (XLogBackgroundFlush())
-			left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
-		else if (left_till_hibernate > 0)
-			left_till_hibernate--;
+		SET_DELAY_OR_HIBERNATE(XLogBackgroundFlush(), WalWriterDelay);
 
 		/* Send WAL statistics to the stats collector */
 		pgstat_send_wal(false);
 
-		/*
-		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
-		 * haven't done anything useful for quite some time, lengthen the
-		 * sleep time so as to reduce the server's idle power consumption.
-		 */
-		if (left_till_hibernate > 0)
-			cur_timeout = WalWriterDelay;	/* in ms */
-		else
-			cur_timeout = WalWriterDelay * HIBERNATE_FACTOR;
-
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 						 cur_timeout,
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 5a68d6dead..4d29331a38 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -48,9 +48,6 @@
 #include "utils/snapmgr.h"
 #include "utils/timeout.h"
 
-/* max sleep time between cycles (3min) */
-#define DEFAULT_NAPTIME_PER_CYCLE 180000L
-
 int			max_logical_replication_workers = 4;
 int			max_sync_workers_per_subscription = 2;
 
@@ -802,6 +799,7 @@ void
 ApplyLauncherMain(Datum main_arg)
 {
 	TimestampTz last_start_time = 0;
+	DECLARE_HIBERNATE_VARS();
 
 	ereport(DEBUG1,
 			(errmsg_internal("logical replication launcher started")));
@@ -831,11 +829,12 @@ ApplyLauncherMain(Datum main_arg)
 		MemoryContext subctx;
 		MemoryContext oldctx;
 		TimestampTz now;
-		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		bool		work_done;
 
 		CHECK_FOR_INTERRUPTS();
 
 		now = GetCurrentTimestamp();
+		work_done = false;
 
 		/* Limit the start retry to once a wal_retrieve_retry_interval */
 		if (TimestampDifferenceExceeds(last_start_time, now,
@@ -866,7 +865,7 @@ ApplyLauncherMain(Datum main_arg)
 				if (w == NULL)
 				{
 					last_start_time = now;
-					wait_time = wal_retrieve_retry_interval;
+					work_done = true;
 
 					logicalrep_worker_launch(sub->dbid, sub->oid, sub->name,
 											 sub->owner, InvalidOid);
@@ -886,13 +885,16 @@ ApplyLauncherMain(Datum main_arg)
 			 * usually means crash of the worker, so we should retry in
 			 * wal_retrieve_retry_interval again.
 			 */
-			wait_time = wal_retrieve_retry_interval;
+			work_done = true;
 		}
 
+		SET_DELAY_OR_HIBERNATE(work_done,
+							   wal_retrieve_retry_interval);
+
 		/* Wait for more work. */
 		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-					   wait_time,
+					   cur_timeout,
 					   WAIT_EVENT_LOGICAL_LAUNCHER_MAIN);
 
 		if (rc & WL_LATCH_SET)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 5d9acc6173..e405b31c2c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -194,8 +194,6 @@
 #include "utils/syscache.h"
 #include "utils/timeout.h"
 
-#define NAPTIME_PER_CYCLE 1000	/* max sleep time between cycles (1s) */
-
 typedef struct FlushPosition
 {
 	dlist_node	node;
@@ -2566,6 +2564,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 	bool		ping_sent = false;
 	TimeLineID	tli;
 	ErrorContextCallback errcallback;
+	DECLARE_HIBERNATE_VARS();
 
 	/*
 	 * Init the ApplyMessageContext which we clean up after each replication
@@ -2602,7 +2601,6 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		int			len;
 		char	   *buf = NULL;
 		bool		endofstream = false;
-		long		wait_time;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2636,6 +2634,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 					/* Reset timeout. */
 					last_recv_timestamp = GetCurrentTimestamp();
 					ping_sent = false;
+					RESET_TO_NON_HIBERNATE();
 
 					/* Ensure we are reading the data into our memory context. */
 					MemoryContextSwitchTo(ApplyMessageContext);
@@ -2724,15 +2723,15 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		 * no particular urgency about waking up unless we get data or a
 		 * signal.
 		 */
-		if (!dlist_is_empty(&lsn_mapping))
-			wait_time = WalWriterDelay;
-		else
-			wait_time = NAPTIME_PER_CYCLE;
+		SET_DELAY_OR_HIBERNATE_OPT(!dlist_is_empty(&lsn_mapping),
+									WalWriterDelay,
+									wal_receiver_timeout / 2);
 
 		rc = WaitLatchOrSocket(MyLatch,
 							   WL_SOCKET_READABLE | WL_LATCH_SET |
 							   WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-							   fd, wait_time,
+							   fd,
+							   cur_timeout,
 							   WAIT_EVENT_LOGICAL_APPLY_MAIN);
 
 		if (rc & WL_LATCH_SET)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index ceaff097b9..f351038aca 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -95,7 +95,7 @@ bool		hot_standby_feedback;
 static WalReceiverConn *wrconn = NULL;
 WalReceiverFunctionsType *WalReceiverFunctions = NULL;
 
-#define NAPTIME_PER_CYCLE 100	/* max sleep time between cycles (100ms) */
+#define WALRECEIVER_WAIT_PER_CYCLE 100L	/* normal sleep time between cycles (100ms) */
 
 /*
  * These variables are used similarly to openLogFile/SegNo,
@@ -513,7 +513,7 @@ WalReceiverMain(void)
 									   WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
 									   WL_TIMEOUT | WL_LATCH_SET,
 									   wait_fd,
-									   NAPTIME_PER_CYCLE,
+									   wal_receiver_timeout / 2,
 									   WAIT_EVENT_WAL_RECEIVER_MAIN);
 				if (rc & WL_LATCH_SET)
 				{
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 90798b9d53..dfb2d9836b 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -33,12 +33,6 @@
 
 WalRcvData *WalRcv = NULL;
 
-/*
- * How long to wait for walreceiver to start up after requesting
- * postmaster to launch it. In seconds.
- */
-#define WALRCV_STARTUP_TIMEOUT 10
-
 /* Report shared memory space needed by WalRcvShmemInit */
 Size
 WalRcvShmemSize(void)
@@ -96,7 +90,7 @@ WalRcvRunning(void)
 	{
 		pg_time_t	now = (pg_time_t) time(NULL);
 
-		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
+		if ((now - startTime) > wal_receiver_timeout)
 		{
 			bool		stopped = false;
 
@@ -147,7 +141,7 @@ WalRcvStreaming(void)
 	{
 		pg_time_t	now = (pg_time_t) time(NULL);
 
-		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
+		if ((now - startTime) > wal_receiver_timeout)
 		{
 			bool		stopped = false;
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4a094bb38b..3186ad7ad3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -325,7 +325,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 0dd79d73fa..4301f189f5 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -83,6 +83,30 @@
  * only, so using any latch other than the process latch effectively precludes
  * use of any generic handler.
  *
+ * Since not all servers have a 24/7 duty cycle, if a worker has no current
+ * work then it can be enhanced to eventually hibernate. The goal here is
+ * to reduce the CPU wakeups and thus reduce electrical power consumption,
+ * making a small contribution to reducing global warming. To support
+ * hibernation, we add some additional macros to judge when, and if, to
+ * switch from a normal loop wait to hibernate timeout.
+ * Using these macros helps have a common design pattern for hibernation.
+ * 1. Add in DECLARE_HIBERNATE_VARS(); to add private vars to this module
+ * 2. Add SET_DELAY_OR_HIBERNATE() to eventually hibernate if idle
+ * 3. Use the variable "curr_timeout" in WaitLatch()
+ * Hibernation for this worker is independent of any other worker.
+ *
+ * #define	normal_wait_timeout_ms = 100L;
+ * DECLARE_HIBERNATE_VARS();
+ * for (;;)
+ * {
+ *	   if (work to do)
+ *		   Do Stuff(); // in particular, exit loop if some condition satisfied
+ *     SET_DELAY_OR_HIBERNATE(work_to_do, normal_wait_timeout_ms);
+ *	   WaitLatch(cur_timeout);
+ *	   ResetLatch();
+ * }
+ * A working example is shown in src/test/modules/worker_spi/worker_spi.c
+ * as well as in code for workers in src/backend/postmaster et al.
  *
  * WaitEventSets allow to wait for latches being set and additional events -
  * postmaster dying and socket readiness of several sockets currently - at the
@@ -140,6 +164,51 @@ typedef struct Latch
 							 WL_SOCKET_CONNECTED | \
 							 WL_SOCKET_CLOSED)
 
+/*
+ * Common design pattern for Hibernation.
+ *
+ * Sleep until we are signaled or normal_delay has elapsed.  If we
+ * haven't done any work for quite some time, lengthen the sleep
+ * time so as to reduce the server's idle power consumption.
+ *
+ * Avoids use static vars to allow each process to
+ * have its own private, independent counters.
+ */
+
+#define HIBERNATE_DELAY_SEC		300
+#define HIBERNATE_DELAY_MS		(1000L * HIBERNATE_DELAY_SEC)
+#define LOOPS_UNTIL_HIBERNATE	50
+
+#define	SET_DELAY_OR_HIBERNATE(work_done, normal_delay)	\
+	if (work_done)									\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+	else if (left_till_hibernate > 0)				\
+		left_till_hibernate--;						\
+	if (left_till_hibernate > 0)					\
+		cur_timeout = normal_delay;					\
+	else											\
+		cur_timeout = HIBERNATE_DELAY_MS;
+
+#define	SET_DELAY_OR_HIBERNATE_OPT(work_done, normal_delay, hib_delay) \
+	if (work_done)									\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+	else if (left_till_hibernate > 0)				\
+		left_till_hibernate--;						\
+	if (left_till_hibernate > 0)					\
+		cur_timeout = normal_delay;					\
+	else											\
+		cur_timeout = hib_delay;
+
+#define AM_HIBERNATING()	(left_till_hibernate == 0)
+
+#define DECLARE_HIBERNATE_VARS()					\
+int		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+long	cur_timeout;
+
+#define RESET_TO_NON_HIBERNATE()					\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
+
+
 typedef struct WaitEvent
 {
 	int			pos;			/* position in the event data structure */
diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index 05ced63780..31d626a122 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -46,6 +46,8 @@ PG_MODULE_MAGIC;
 
 PG_FUNCTION_INFO_V1(worker_spi_launch);
 
+DECLARE_HIBERNATE_VARS();
+
 void		_PG_init(void);
 void		worker_spi_main(Datum) pg_attribute_noreturn();
 
@@ -137,6 +139,7 @@ worker_spi_main(Datum main_arg)
 	worktable  *table;
 	StringInfoData buf;
 	char		name[20];
+	bool		work_done = true;
 
 	table = palloc(sizeof(worktable));
 	sprintf(name, "schema%d", index);
@@ -191,6 +194,13 @@ worker_spi_main(Datum main_arg)
 	{
 		int			ret;
 
+		/*
+		 * Use the standard design pattern for wait time/hibernation.
+		 * After 50 consecutive loops with work_done=false the wait time
+		 * will be set to the standard hibernation timeout.
+		 */
+		SET_DELAY_OR_HIBERNATE(work_done, worker_spi_naptime * 1000L);
+
 		/*
 		 * Background workers mustn't call usleep() or any direct equivalent:
 		 * instead, they may wait on their process latch, which sleeps as
@@ -199,7 +209,7 @@ worker_spi_main(Datum main_arg)
 		 */
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-						 worker_spi_naptime * 1000L,
+						 cur_timeout,
 						 PG_WAIT_EXTENSION);
 		ResetLatch(MyLatch);
 
@@ -256,7 +266,10 @@ worker_spi_main(Datum main_arg)
 				elog(LOG, "%s: count in %s.%s is now %d",
 					 MyBgworkerEntry->bgw_name,
 					 table->schema, table->name, val);
+			work_done = true;
 		}
+		else
+			work_done = false;
 
 		/*
 		 * And finish our transaction.
#7Magnus Hagander
magnus@hagander.net
In reply to: Simon Riggs (#4)
Re: Reducing power consumption on idle servers

On Mon, Feb 21, 2022 at 5:11 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

On Sat, 19 Feb 2022 at 17:03, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-02-19 14:10:39 +0000, Simon Riggs wrote:
IMO we should instead consider either deprecating file based promotion, or
adding an optional dependency on filesystem monitoring APIs (i.e. inotify etc)
that avoid the need to poll for file creation.

Came here to suggest exactly that :)

Deprecating explicit file-based promotion is possible and simple, so
that is the approach in the latest version of the patch.

Is there any actual use-case for this other than backwards
compatibility? The alternative has certainly been around for some time
now, so if we don't know a specific use-case for the file-based one
it's definitely time to deprecate it properly.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#7)
Re: Reducing power consumption on idle servers

Magnus Hagander <magnus@hagander.net> writes:

Deprecating explicit file-based promotion is possible and simple, so
that is the approach in the latest version of the patch.

Is there any actual use-case for this other than backwards
compatibility?

The fundamental problem with signal-based promotion is that it's
an edge-triggered rather than level-triggered condition, ie if you
miss the single notification event you're screwed. I'm not sure
why we'd want to go there, considering all the problems we're having
with edge-triggered APIs in nearby threads.

We could combine the two approaches, perhaps: have "pg_ctl promote"
create a trigger file and then send a signal. The trigger file
would record the existence of the promotion request, so that if the
postmaster isn't running right now or misses the signal, it'd still
promote when restarted or otherwise at the next opportunity.
But with an approach like this, we could expect quick response to
the signal in normal conditions, without need for constant wakeups
to check for the file. Also, this route would allow overloading
of the signal, since it would just mean "wake up, I asked you to
do something" rather than needing to carry the full semantics of
what is to be done.

regards, tom lane

#9Jim Nasby
nasbyj@amazon.com
In reply to: Simon Riggs (#4)
Re: Reducing power consumption on idle servers

On 2/21/22 10:11 AM, Simon Riggs wrote:

* autovac launcher - autovacuum_naptime

On production systems autovacuum_naptime often can't be a large value,
otherwise it's easy to not keep up on small busy tables. That's fine for
actually busy servers, but with the increase in hosted PG offerings, the
defaults in those offerings needs to cater to a broader audience.

Autovac varies its wakeup cycle according to how much work is done. It
is OK to set autovacuum_naptime without affecting power consumption
when idle.

I'm wondering how many people understand that. I've seen a number of
servers running very low values of autovacuum_naptime in order to make
things more responsive.

Idle for autovac is defined slightly differently, since if all user
work completes then there may still be a lot of vacuuming to do before
it goes fully idle. But my observation is that there are many servers
that go idle for more than 50% of each week, when operating 8-12 hours
per day, 5 days per week, so we can still save a lot of power.

This patch doesn't change how autovac works, it just uses a common
setting for the hibernation that eventually occurs.

I'm wondering if it'd be worth linking autovac wakeup from a truly idle
state to the stats collector. If there's no stats messages coming in
clearly there's nothing new for autovac.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#9)
Re: Reducing power consumption on idle servers

Jim Nasby <nasbyj@amazon.com> writes:

I'm wondering if it'd be worth linking autovac wakeup from a truly idle
state to the stats collector. If there's no stats messages coming in
clearly there's nothing new for autovac.

That seems pretty scary in the current system design, where the
stats collector is intentionally not 100% reliable (and sometimes,
less intentionally, it fails completely). If we get to a place
where we're willing to bank on stats being collected 100% of the
time, it might make sense.

regards, tom lane

#11Zheng Li
zhengli10@gmail.com
In reply to: Tom Lane (#10)
Re: Reducing power consumption on idle servers

Hi,

Replication never got the memo, so power consumption on an
idle server is not very effective on standby or logical subscribers.
The code and timing for hibernation is also different for each worker,
which is confusing.

Agree, this patch makes it easier to understand the hibernation
behavior of various workers.

1. Standardize the hibernation time at 60s, using a #define
HIBERNATE_DELAY_SEC 60

I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
seconds, what’s the reasoning behind it? Is longer hibernation delay
better? If so can we set it to INT_MAX (the max timeout allowed by
WaitLatch()) in which case a worker in hibernation only relies on
wakeup? I think it would be nice to run experiments to verify that the
patch reduces power consumption while varying the value of
HIBERNATE_DELAY_SEC.

Regards,
Zheng Li
Amazon RDS/Aurora for PostgreSQL

Show quoted text

On Thu, Mar 3, 2022 at 5:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jim Nasby <nasbyj@amazon.com> writes:

I'm wondering if it'd be worth linking autovac wakeup from a truly idle
state to the stats collector. If there's no stats messages coming in
clearly there's nothing new for autovac.

That seems pretty scary in the current system design, where the
stats collector is intentionally not 100% reliable (and sometimes,
less intentionally, it fails completely). If we get to a place
where we're willing to bank on stats being collected 100% of the
time, it might make sense.

regards, tom lane

#12Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Tom Lane (#8)
Re: Reducing power consumption on idle servers

On Sat, 26 Feb 2022 at 17:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

Deprecating explicit file-based promotion is possible and simple, so
that is the approach in the latest version of the patch.

Is there any actual use-case for this other than backwards
compatibility?

The fundamental problem with signal-based promotion is that it's
an edge-triggered rather than level-triggered condition, ie if you
miss the single notification event you're screwed. I'm not sure
why we'd want to go there, considering all the problems we're having
with edge-triggered APIs in nearby threads.

We could combine the two approaches, perhaps: have "pg_ctl promote"
create a trigger file and then send a signal. The trigger file
would record the existence of the promotion request, so that if the
postmaster isn't running right now or misses the signal, it'd still
promote when restarted or otherwise at the next opportunity.
But with an approach like this, we could expect quick response to
the signal in normal conditions, without need for constant wakeups
to check for the file. Also, this route would allow overloading
of the signal, since it would just mean "wake up, I asked you to
do something" rather than needing to carry the full semantics of
what is to be done.

The above is how this works now, luckily, but few comments explain why.

This current state evolved because I first added file-based promotion,
for the reasons you give, but it was slow, so the signal approach was
added later, with new API.

What we are discussing deprecating is the ability to create the
trigger file directly (the original API). Once that is gone the
mechanism would be to request promotion via pg_ctl promote (the new
API), which then creates the trigger file and sends a signal. So in
both APIs the trigger file is still used.

In this patch, if you create the trigger file
directly/explicitly/yourself (the old API) then it will still trigger
when hibernating, but it will be slower. The new API is unaffected by
this change.

--
Simon Riggs http://www.EnterpriseDB.com/

#13Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Zheng Li (#11)
Re: Reducing power consumption on idle servers

On Wed, 9 Mar 2022 at 01:16, Zheng Li <zhengli10@gmail.com> wrote:

1. Standardize the hibernation time at 60s, using a #define
HIBERNATE_DELAY_SEC 60

I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
seconds, what’s the reasoning behind it? Is longer hibernation delay
better? If so can we set it to INT_MAX (the max timeout allowed by
WaitLatch()) in which case a worker in hibernation only relies on
wakeup? I think it would be nice to run experiments to verify that the
patch reduces power consumption while varying the value of
HIBERNATE_DELAY_SEC.

Setting it to INT_MAX would be the same as not allowing a timeout,
which changes a lot of current behavior and makes it less robust.

Waking once per minute is what we do in various cases, so 60sec is a
good choice.

In the case of logical rep launcher we currently use 300sec, so using
60s would decrease this.

I don't see much difference between power consumption with timeouts of
60s and 300s.

In the latest patch, I chose 300s. Does anyone have an opinion on the
value here?

--
Simon Riggs http://www.EnterpriseDB.com/

#14Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#13)
Re: Reducing power consumption on idle servers

Hi,

On 2022-03-10 17:50:47 +0000, Simon Riggs wrote:

On Wed, 9 Mar 2022 at 01:16, Zheng Li <zhengli10@gmail.com> wrote:

1. Standardize the hibernation time at 60s, using a #define
HIBERNATE_DELAY_SEC 60

I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
seconds, what’s the reasoning behind it? Is longer hibernation delay
better? If so can we set it to INT_MAX (the max timeout allowed by
WaitLatch()) in which case a worker in hibernation only relies on
wakeup? I think it would be nice to run experiments to verify that the
patch reduces power consumption while varying the value of
HIBERNATE_DELAY_SEC.

Setting it to INT_MAX would be the same as not allowing a timeout,
which changes a lot of current behavior and makes it less robust.

Most of these timeouts are a bad idea and should not exist. We repeatedly have
had bugs where we were missing wakeups etc but those bugs were harder to
notice because of timeouts. I'm against entrenching this stuff further.

Greetings,

Andres Freund

#15Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#6)
Re: Reducing power consumption on idle servers

On 2022-02-21 21:04:19 +0000, Simon Riggs wrote:

On Mon, 21 Feb 2022 at 16:49, Chapman Flack <chap@anastigmatix.net> wrote:

Shouldn't the comment be "with work_done=false" ?

Good catch, thanks.

I've also added docs to say that "promote_trigger_file" is now
deprecated. There were no tests for that functionality, so just as
well it is being removed.

Doesn't pass tests, and hasn't for weeks from what I can see: https://cirrus-ci.com/task/5925633648754688?logs=test_world#L1153

Marked as waiting-on-author.

- Andres

#16Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Andres Freund (#15)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Tue, 22 Mar 2022 at 00:54, Andres Freund <andres@anarazel.de> wrote:

On 2022-02-21 21:04:19 +0000, Simon Riggs wrote:

On Mon, 21 Feb 2022 at 16:49, Chapman Flack <chap@anastigmatix.net> wrote:

Shouldn't the comment be "with work_done=false" ?

Good catch, thanks.

I've also added docs to say that "promote_trigger_file" is now
deprecated. There were no tests for that functionality, so just as
well it is being removed.

Doesn't pass tests, and hasn't for weeks from what I can see: https://cirrus-ci.com/task/5925633648754688?logs=test_world#L1153

Marked as waiting-on-author.

Hmm, just the not-in-sample test again. Fixed by marking GUC_NOT_IN_SAMPLE.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_to_reduce_power_consumption.v4.patchapplication/octet-stream; name=hibernate_to_reduce_power_consumption.v4.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a48973b3c..f2d4119cce 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4519,9 +4519,12 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         <listitem>
          <para>
           Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
+          standby, though this may take as long as 300s to take effect.
+          This mechanism is now deprecated and will be removed in a later
+          release. The preferred mechanism by which to promote the standby 
+          is by using <command>pg_ctl promote</command> or calling
+          <function>pg_promote()</function>, both of which promote much
+          faster than using this parameter.
           This parameter can only be set in the <filename>postgresql.conf</filename>
           file or on the server command line.
          </para>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 81fa26f985..9174dc9318 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  Use of
+    <varname>promote_trigger_file</varname> is deprecated. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,12 +1483,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    Use of <varname>promote_trigger_file</varname> is deprecated. If you're
     setting up the reporting servers that are only used to offload read-only
     queries from the primary, not for high availability purposes, you don't
     need to promote it.
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 9feea3e6ec..1eadc448a6 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3709,14 +3709,17 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					}
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver.  Hibernate, but time out
+					 * to react to a trigger file.  Direct use of trigger file
+					 * is now deprecated and the promote_trigger_file will be
+					 * removed in a later use.
 					 */
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |
 									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 HIBERNATE_DELAY_MS,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 681ef91b81..942cc4a07f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -134,7 +134,6 @@ int			Log_autovacuum_min_duration = 600000;
 
 /* the minimum allowed time between two awakenings of the launcher */
 #define MIN_AUTOVAC_SLEEPTIME 100.0 /* milliseconds */
-#define MAX_AUTOVAC_SLEEPTIME 300	/* seconds */
 
 /* Flags to tell if we are in an autovacuum process */
 static bool am_autovacuum_launcher = false;
@@ -932,8 +931,8 @@ launcher_determine_sleep(bool canlaunch, bool recursing, struct timeval *nap)
 	 * infinite sleep in strange cases like the system clock going backwards a
 	 * few years.
 	 */
-	if (nap->tv_sec > MAX_AUTOVAC_SLEEPTIME)
-		nap->tv_sec = MAX_AUTOVAC_SLEEPTIME;
+	if (nap->tv_sec > HIBERNATE_DELAY_SEC)
+		nap->tv_sec = HIBERNATE_DELAY_SEC;
 }
 
 /*
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index d1f5d12eff..9566fd2b3d 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -60,12 +60,6 @@
  */
 int			BgWriterDelay = 200;
 
-/*
- * Multiplier to apply to BgWriterDelay when we decide to hibernate.
- * (Perhaps this needs to be configurable?)
- */
-#define HIBERNATE_FACTOR			50
-
 /*
  * Interval in which standby snapshots are logged into the WAL stream, in
  * milliseconds.
@@ -337,7 +331,7 @@ BackgroundWriterMain(void)
 			/* Sleep ... */
 			(void) WaitLatch(MyLatch,
 							 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-							 BgWriterDelay * HIBERNATE_FACTOR,
+							 HIBERNATE_DELAY_MS,
 							 WAIT_EVENT_BGWRITER_HIBERNATE);
 			/* Reset the notification request in case we timed out */
 			StrategyNotifyBgWriter(-1);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index d916ed39a8..156a4b6404 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -53,8 +53,6 @@
  * Timer definitions.
  * ----------
  */
-#define PGARCH_AUTOWAKE_INTERVAL 60 /* How often to force a poll of the
-									 * archive status directory; in seconds. */
 #define PGARCH_RESTART_INTERVAL 10	/* How often to attempt to restart a
 									 * failed archiver; in seconds. */
 
@@ -346,7 +344,7 @@ pgarch_MainLoop(void)
 
 		/*
 		 * Sleep until a signal is received, or until a poll is forced by
-		 * PGARCH_AUTOWAKE_INTERVAL having passed since last_copy_time, or
+		 * HIBERNATE_DELAY_SEC having passed since last_copy_time, or
 		 * until postmaster dies.
 		 */
 		if (!time_to_stop)		/* Don't wait during last iteration */
@@ -354,7 +352,7 @@ pgarch_MainLoop(void)
 			pg_time_t	curtime = (pg_time_t) time(NULL);
 			int			timeout;
 
-			timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time);
+			timeout = HIBERNATE_DELAY_SEC - (curtime - last_copy_time);
 			if (timeout > 0)
 			{
 				int			rc;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 80bb269599..e998896917 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1637,7 +1637,7 @@ DetermineSleepTime(struct timeval *timeout)
 		}
 		else
 		{
-			timeout->tv_sec = 60;
+			timeout->tv_sec = HIBERNATE_DELAY_SEC;
 			timeout->tv_usec = 0;
 		}
 		return;
@@ -1695,15 +1695,15 @@ DetermineSleepTime(struct timeval *timeout)
 		timeout->tv_usec = microsecs;
 
 		/* Ensure we don't exceed one minute */
-		if (timeout->tv_sec > 60)
+		if (timeout->tv_sec > HIBERNATE_DELAY_SEC)
 		{
-			timeout->tv_sec = 60;
+			timeout->tv_sec = HIBERNATE_DELAY_SEC;
 			timeout->tv_usec = 0;
 		}
 	}
 	else
 	{
-		timeout->tv_sec = 60;
+		timeout->tv_sec = HIBERNATE_DELAY_SEC;
 		timeout->tv_usec = 0;
 	}
 }
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 102fa2a089..7f0b74d0de 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -70,13 +70,7 @@
 int			WalWriterDelay = 200;
 int			WalWriterFlushAfter = 128;
 
-/*
- * Number of do-nothing loops before lengthening the delay time, and the
- * multiplier to apply to WalWriterDelay when we do decide to hibernate.
- * (Perhaps these need to be configurable?)
- */
-#define LOOPS_UNTIL_HIBERNATE		50
-#define HIBERNATE_FACTOR			25
+DECLARE_HIBERNATE_VARS();
 
 /* Prototypes for private functions */
 static void HandleWalWriterInterrupts(void);
@@ -210,7 +204,7 @@ WalWriterMain(void)
 	/*
 	 * Reset hibernation state after any error.
 	 */
-	left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
+	RESET_TO_NON_HIBERNATE();
 	hibernating = false;
 	SetWalWriterSleeping(false);
 
@@ -225,8 +219,6 @@ WalWriterMain(void)
 	 */
 	for (;;)
 	{
-		long		cur_timeout;
-
 		/*
 		 * Advertise whether we might hibernate in this cycle.  We do this
 		 * before resetting the latch to ensure that any async commits will
@@ -252,24 +244,11 @@ WalWriterMain(void)
 		 * Do what we're here for; then, if XLogBackgroundFlush() found useful
 		 * work to do, reset hibernation counter.
 		 */
-		if (XLogBackgroundFlush())
-			left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
-		else if (left_till_hibernate > 0)
-			left_till_hibernate--;
+		SET_DELAY_OR_HIBERNATE(XLogBackgroundFlush(), WalWriterDelay);
 
 		/* Send WAL statistics to the stats collector */
 		pgstat_send_wal(false);
 
-		/*
-		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
-		 * haven't done anything useful for quite some time, lengthen the
-		 * sleep time so as to reduce the server's idle power consumption.
-		 */
-		if (left_till_hibernate > 0)
-			cur_timeout = WalWriterDelay;	/* in ms */
-		else
-			cur_timeout = WalWriterDelay * HIBERNATE_FACTOR;
-
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 						 cur_timeout,
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 6f25b2c2ad..6ae178c1a7 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -48,9 +48,6 @@
 #include "utils/snapmgr.h"
 #include "utils/timeout.h"
 
-/* max sleep time between cycles (3min) */
-#define DEFAULT_NAPTIME_PER_CYCLE 180000L
-
 int			max_logical_replication_workers = 4;
 int			max_sync_workers_per_subscription = 2;
 
@@ -802,6 +799,7 @@ void
 ApplyLauncherMain(Datum main_arg)
 {
 	TimestampTz last_start_time = 0;
+	DECLARE_HIBERNATE_VARS();
 
 	ereport(DEBUG1,
 			(errmsg_internal("logical replication launcher started")));
@@ -831,11 +829,12 @@ ApplyLauncherMain(Datum main_arg)
 		MemoryContext subctx;
 		MemoryContext oldctx;
 		TimestampTz now;
-		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
+		bool		work_done;
 
 		CHECK_FOR_INTERRUPTS();
 
 		now = GetCurrentTimestamp();
+		work_done = false;
 
 		/* Limit the start retry to once a wal_retrieve_retry_interval */
 		if (TimestampDifferenceExceeds(last_start_time, now,
@@ -866,7 +865,7 @@ ApplyLauncherMain(Datum main_arg)
 				if (w == NULL)
 				{
 					last_start_time = now;
-					wait_time = wal_retrieve_retry_interval;
+					work_done = true;
 
 					logicalrep_worker_launch(sub->dbid, sub->oid, sub->name,
 											 sub->owner, InvalidOid);
@@ -886,13 +885,16 @@ ApplyLauncherMain(Datum main_arg)
 			 * usually means crash of the worker, so we should retry in
 			 * wal_retrieve_retry_interval again.
 			 */
-			wait_time = wal_retrieve_retry_interval;
+			work_done = true;
 		}
 
+		SET_DELAY_OR_HIBERNATE(work_done,
+							   wal_retrieve_retry_interval);
+
 		/* Wait for more work. */
 		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-					   wait_time,
+					   cur_timeout,
 					   WAIT_EVENT_LOGICAL_LAUNCHER_MAIN);
 
 		if (rc & WL_LATCH_SET)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 82dcffc2db..deb4ed0583 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -196,8 +196,6 @@
 #include "utils/syscache.h"
 #include "utils/timeout.h"
 
-#define NAPTIME_PER_CYCLE 1000	/* max sleep time between cycles (1s) */
-
 typedef struct FlushPosition
 {
 	dlist_node	node;
@@ -2656,6 +2654,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 	bool		ping_sent = false;
 	TimeLineID	tli;
 	ErrorContextCallback errcallback;
+	DECLARE_HIBERNATE_VARS();
 
 	/*
 	 * Init the ApplyMessageContext which we clean up after each replication
@@ -2692,7 +2691,6 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		int			len;
 		char	   *buf = NULL;
 		bool		endofstream = false;
-		long		wait_time;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2726,6 +2724,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 					/* Reset timeout. */
 					last_recv_timestamp = GetCurrentTimestamp();
 					ping_sent = false;
+					RESET_TO_NON_HIBERNATE();
 
 					/* Ensure we are reading the data into our memory context. */
 					MemoryContextSwitchTo(ApplyMessageContext);
@@ -2814,15 +2813,15 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		 * no particular urgency about waking up unless we get data or a
 		 * signal.
 		 */
-		if (!dlist_is_empty(&lsn_mapping))
-			wait_time = WalWriterDelay;
-		else
-			wait_time = NAPTIME_PER_CYCLE;
+		SET_DELAY_OR_HIBERNATE_OPT(!dlist_is_empty(&lsn_mapping),
+									WalWriterDelay,
+									wal_receiver_timeout / 2);
 
 		rc = WaitLatchOrSocket(MyLatch,
 							   WL_SOCKET_READABLE | WL_LATCH_SET |
 							   WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-							   fd, wait_time,
+							   fd,
+							   cur_timeout,
 							   WAIT_EVENT_LOGICAL_APPLY_MAIN);
 
 		if (rc & WL_LATCH_SET)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index ceaff097b9..f351038aca 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -95,7 +95,7 @@ bool		hot_standby_feedback;
 static WalReceiverConn *wrconn = NULL;
 WalReceiverFunctionsType *WalReceiverFunctions = NULL;
 
-#define NAPTIME_PER_CYCLE 100	/* max sleep time between cycles (100ms) */
+#define WALRECEIVER_WAIT_PER_CYCLE 100L	/* normal sleep time between cycles (100ms) */
 
 /*
  * These variables are used similarly to openLogFile/SegNo,
@@ -513,7 +513,7 @@ WalReceiverMain(void)
 									   WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
 									   WL_TIMEOUT | WL_LATCH_SET,
 									   wait_fd,
-									   NAPTIME_PER_CYCLE,
+									   wal_receiver_timeout / 2,
 									   WAIT_EVENT_WAL_RECEIVER_MAIN);
 				if (rc & WL_LATCH_SET)
 				{
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 90798b9d53..dfb2d9836b 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -33,12 +33,6 @@
 
 WalRcvData *WalRcv = NULL;
 
-/*
- * How long to wait for walreceiver to start up after requesting
- * postmaster to launch it. In seconds.
- */
-#define WALRCV_STARTUP_TIMEOUT 10
-
 /* Report shared memory space needed by WalRcvShmemInit */
 Size
 WalRcvShmemSize(void)
@@ -96,7 +90,7 @@ WalRcvRunning(void)
 	{
 		pg_time_t	now = (pg_time_t) time(NULL);
 
-		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
+		if ((now - startTime) > wal_receiver_timeout)
 		{
 			bool		stopped = false;
 
@@ -147,7 +141,7 @@ WalRcvStreaming(void)
 	{
 		pg_time_t	now = (pg_time_t) time(NULL);
 
-		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
+		if ((now - startTime) > wal_receiver_timeout)
 		{
 			bool		stopped = false;
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 932aefc777..f42bd18aa2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3995,7 +3995,8 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
-			NULL
+			NULL,
+			GUC_NOT_IN_SAMPLE
 		},
 		&PromoteTriggerFile,
 		"",
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4cf5b26a36..e61a361e80 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -325,7 +325,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 0dd79d73fa..4301f189f5 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -83,6 +83,30 @@
  * only, so using any latch other than the process latch effectively precludes
  * use of any generic handler.
  *
+ * Since not all servers have a 24/7 duty cycle, if a worker has no current
+ * work then it can be enhanced to eventually hibernate. The goal here is
+ * to reduce the CPU wakeups and thus reduce electrical power consumption,
+ * making a small contribution to reducing global warming. To support
+ * hibernation, we add some additional macros to judge when, and if, to
+ * switch from a normal loop wait to hibernate timeout.
+ * Using these macros helps have a common design pattern for hibernation.
+ * 1. Add in DECLARE_HIBERNATE_VARS(); to add private vars to this module
+ * 2. Add SET_DELAY_OR_HIBERNATE() to eventually hibernate if idle
+ * 3. Use the variable "curr_timeout" in WaitLatch()
+ * Hibernation for this worker is independent of any other worker.
+ *
+ * #define	normal_wait_timeout_ms = 100L;
+ * DECLARE_HIBERNATE_VARS();
+ * for (;;)
+ * {
+ *	   if (work to do)
+ *		   Do Stuff(); // in particular, exit loop if some condition satisfied
+ *     SET_DELAY_OR_HIBERNATE(work_to_do, normal_wait_timeout_ms);
+ *	   WaitLatch(cur_timeout);
+ *	   ResetLatch();
+ * }
+ * A working example is shown in src/test/modules/worker_spi/worker_spi.c
+ * as well as in code for workers in src/backend/postmaster et al.
  *
  * WaitEventSets allow to wait for latches being set and additional events -
  * postmaster dying and socket readiness of several sockets currently - at the
@@ -140,6 +164,51 @@ typedef struct Latch
 							 WL_SOCKET_CONNECTED | \
 							 WL_SOCKET_CLOSED)
 
+/*
+ * Common design pattern for Hibernation.
+ *
+ * Sleep until we are signaled or normal_delay has elapsed.  If we
+ * haven't done any work for quite some time, lengthen the sleep
+ * time so as to reduce the server's idle power consumption.
+ *
+ * Avoids use static vars to allow each process to
+ * have its own private, independent counters.
+ */
+
+#define HIBERNATE_DELAY_SEC		300
+#define HIBERNATE_DELAY_MS		(1000L * HIBERNATE_DELAY_SEC)
+#define LOOPS_UNTIL_HIBERNATE	50
+
+#define	SET_DELAY_OR_HIBERNATE(work_done, normal_delay)	\
+	if (work_done)									\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+	else if (left_till_hibernate > 0)				\
+		left_till_hibernate--;						\
+	if (left_till_hibernate > 0)					\
+		cur_timeout = normal_delay;					\
+	else											\
+		cur_timeout = HIBERNATE_DELAY_MS;
+
+#define	SET_DELAY_OR_HIBERNATE_OPT(work_done, normal_delay, hib_delay) \
+	if (work_done)									\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+	else if (left_till_hibernate > 0)				\
+		left_till_hibernate--;						\
+	if (left_till_hibernate > 0)					\
+		cur_timeout = normal_delay;					\
+	else											\
+		cur_timeout = hib_delay;
+
+#define AM_HIBERNATING()	(left_till_hibernate == 0)
+
+#define DECLARE_HIBERNATE_VARS()					\
+int		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;\
+long	cur_timeout;
+
+#define RESET_TO_NON_HIBERNATE()					\
+		left_till_hibernate = LOOPS_UNTIL_HIBERNATE;
+
+
 typedef struct WaitEvent
 {
 	int			pos;			/* position in the event data structure */
diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index 48829df29c..fd907a6511 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -46,6 +46,8 @@ PG_MODULE_MAGIC;
 
 PG_FUNCTION_INFO_V1(worker_spi_launch);
 
+DECLARE_HIBERNATE_VARS();
+
 void		_PG_init(void);
 void		worker_spi_main(Datum) pg_attribute_noreturn();
 
@@ -137,6 +139,7 @@ worker_spi_main(Datum main_arg)
 	worktable  *table;
 	StringInfoData buf;
 	char		name[20];
+	bool		work_done = true;
 
 	table = palloc(sizeof(worktable));
 	sprintf(name, "schema%d", index);
@@ -191,6 +194,13 @@ worker_spi_main(Datum main_arg)
 	{
 		int			ret;
 
+		/*
+		 * Use the standard design pattern for wait time/hibernation.
+		 * After 50 consecutive loops with work_done=false the wait time
+		 * will be set to the standard hibernation timeout.
+		 */
+		SET_DELAY_OR_HIBERNATE(work_done, worker_spi_naptime * 1000L);
+
 		/*
 		 * Background workers mustn't call usleep() or any direct equivalent:
 		 * instead, they may wait on their process latch, which sleeps as
@@ -199,7 +209,7 @@ worker_spi_main(Datum main_arg)
 		 */
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-						 worker_spi_naptime * 1000L,
+						 cur_timeout,
 						 PG_WAIT_EXTENSION);
 		ResetLatch(MyLatch);
 
@@ -256,7 +266,10 @@ worker_spi_main(Datum main_arg)
 				elog(LOG, "%s: count in %s.%s is now %d",
 					 MyBgworkerEntry->bgw_name,
 					 table->schema, table->name, val);
+			work_done = true;
 		}
+		else
+			work_done = false;
 
 		/*
 		 * And finish our transaction.
#17Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andres Freund (#14)
Re: Reducing power consumption on idle servers

At Thu, 10 Mar 2022 11:45:10 -0800, Andres Freund <andres@anarazel.de> wrote in

Hi,

On 2022-03-10 17:50:47 +0000, Simon Riggs wrote:

On Wed, 9 Mar 2022 at 01:16, Zheng Li <zhengli10@gmail.com> wrote:

1. Standardize the hibernation time at 60s, using a #define
HIBERNATE_DELAY_SEC 60

I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
seconds, what’s the reasoning behind it? Is longer hibernation delay
better? If so can we set it to INT_MAX (the max timeout allowed by
WaitLatch()) in which case a worker in hibernation only relies on
wakeup? I think it would be nice to run experiments to verify that the
patch reduces power consumption while varying the value of
HIBERNATE_DELAY_SEC.

Setting it to INT_MAX would be the same as not allowing a timeout,
which changes a lot of current behavior and makes it less robust.

Most of these timeouts are a bad idea and should not exist. We repeatedly have
had bugs where we were missing wakeups etc but those bugs were harder to

I basically agree to this.

notice because of timeouts. I'm against entrenching this stuff further.

For example, pgarch.c theoretically doesn't need hibernate, since it
has an apparent trigger event and a terminal condition. What we need
to do about archiver is to set timeout only when it didn't reach the
lastest finished segment at an iteration. (this might need additional
shared memory use, though..)

I'm not sure about bgwriter, walwriter and logical replication stuff...

About walreciver,

-		if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
+		if ((now - startTime) > wal_receiver_timeout)

This is simply a misuse of the timeout. WALRCV_STARTP_TIMEOUT is not
used for a sleep and irrelevant to receiving data from the peer
walsender. wal_receiver_timeout's default is 60s but we don't assume
that walreceiver takes that long time to start up. (I think Thomas is
wroking on the walreceiver timeout stuff.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#18Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Kyotaro Horiguchi (#17)
Re: Reducing power consumption on idle servers

On Thu, 24 Mar 2022 at 07:16, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Most of these timeouts are a bad idea and should not exist. We repeatedly have
had bugs where we were missing wakeups etc but those bugs were harder to

I basically agree to this.

As a general point, maybe. But we have a lot of procs doing a lot of
polling and we are unlikely to change that anytime soon.

notice because of timeouts. I'm against entrenching this stuff further.

For example, pgarch.c theoretically doesn't need hibernate, since it
has an apparent trigger event and a terminal condition. What we need
to do about archiver is to set timeout only when it didn't reach the
lastest finished segment at an iteration. (this might need additional
shared memory use, though..)

archiver is not the important thing here. If there is objection to
that section of code we can remove that.

I'm not sure about bgwriter, walwriter and logical replication stuff...

I am sure. bgwriter, walwriter and logical worker need action from us
to allow them to hibernate in a sensible way that allows powersaving.

I think Thomas is wroking on the walreceiver timeout stuff.

Yes, he is. I have no problem going with what he advocates, if others agree.

However, that fix does not solve the whole problem, since many other
procs also need fixing.

The proposals of this patch are the following, each of which can be
independently accepted/rejected:
1. fix the sleep pattern of bgwriter, walwriter and logical worker
(directly affects powersave)
2. deprecate promote_trigger_file, which then allows us to fix the
sleep for startup process (directly affects powersave)
3. treat hibernation in all procs the same, for simplicity, and to
make sure we don't forget one later
4. provide a design pattern for background worker extensions to
follow, so as to encourage powersaving

Please don't reject the whole patch because you disagree with part
(3), it is not that important.

Thanks

--
Simon Riggs http://www.EnterpriseDB.com/

#19Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#18)
Re: Reducing power consumption on idle servers

On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The proposals of this patch are the following, each of which can be
independently accepted/rejected:
1. fix the sleep pattern of bgwriter, walwriter and logical worker
(directly affects powersave)
2. deprecate promote_trigger_file, which then allows us to fix the
sleep for startup process (directly affects powersave)
3. treat hibernation in all procs the same, for simplicity, and to
make sure we don't forget one later
4. provide a design pattern for background worker extensions to
follow, so as to encourage powersaving

Unfortunately, the patch isn't split in a way that corresponds to this
division. I think I'm +1 on change #2 -- deprecating
promote_trigger_file seems like a good idea to me independently of any
power-saving considerations. But I'm not sure that I am on board with
any of the other changes. I do agree with the basic goal of trying to
reduce unnecessary wakeups, but I think the rest of the patch is
taking a bit of a one-size-fits-all approach where I think that we
might want to be more nuanced. I think there are a couple of different
kinds of cases here.

In some places, like DetermineSleepTime(), we're already willing to
sleep for pretty long periods of time, like a minute. I think in those
cases we could consider doing nothing, on the theory that one wakeup
per minute is already not very much. If we do want to do something, I
think it should be in the direction of convincing ourselves that we
don't need a timeout at all. Having a timeout is a bit like insurance:
it guarantees that if for some reason the event by which we expect to
be awoken doesn't actually wake us up even though something meaningful
has happened, we won't hang forever. But if we think a wakeup per
minute is meaningful and we don't think there's any possibility of a
missed wakeup, let's just wait forever. I don't think we'll avoid many
user complaints by recovering from a missed wakeup in just under an
hour.

In other places, like WalWriterMain, we're basically polling. One
question in these kinds of cases is whether we can avoid polling in
favor of having some other process wake us up if the event that we
care about happens. This is unlikely to be practical in all cases -
e.g. we can't wait for a promotion trigger file to show up unless we
want to use inotify or something like that. However, we may be able to
avoid polling in some instances. When we can't, then I think it makes
sense to increase the wait time when the system appears to be idle. In
that subset of cases - we're polling and we can't avoid polling - this
kind of approach seems fairly reasonable.

I do have some concerns about the idea that the right strategy in
general, or even in particular cases, is to multiply by 50. This patch
hasn't invented that problem; it's already there. My concern is that
multiplying a small number by 50 seems very different than multiplying
a large number by 50. If we normally wait for 100ms before checking
for something to happen, and we wait for 5s, it's probably not going
to be a huge issue, because 5s is still a short amount of time for
human beings. If we normally wait for a minute before checking for
something to happen and we wait for 50 minutes, that's much more
likely to make a human being unhappy. Therefore, it's very unclear to
me that those cases should be treated the same way.

There's a more technical issue with this strategy, too: if we multiply
both short and long timeouts by 50, I think we are going to get pretty
much the same result as if we only multiply the short timeouts by 50.
Why even bother reducing the frequency of wakeups from minutes to
hours if we're elsewhere reducing the frequency from seconds to
minutes? If process A is still waking up every minute, putting process
B in the deep freeze isn't going to do a whole lot in terms of letting
the system go into any kind of deeper sleep.

All in all I feel that encouraging developers to adopt a
multiply-by-50 rule in all cases seems too simplistic to me. It may be
better than what we're doing right now, but it doesn't really seem
like the right policy.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Robert Haas (#19)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Thu, 24 Mar 2022 at 15:39, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The proposals of this patch are the following, each of which can be
independently accepted/rejected:
1. fix the sleep pattern of bgwriter, walwriter and logical worker
(directly affects powersave)
2. deprecate promote_trigger_file, which then allows us to fix the
sleep for startup process (directly affects powersave)
3. treat hibernation in all procs the same, for simplicity, and to
make sure we don't forget one later
4. provide a design pattern for background worker extensions to
follow, so as to encourage powersaving

Unfortunately, the patch isn't split in a way that corresponds to this
division. I think I'm +1 on change #2 -- deprecating
promote_trigger_file seems like a good idea to me independently of any
power-saving considerations. But I'm not sure that I am on board with
any of the other changes.

OK, so change (2) is good. Separate patch attached.

I do agree with the basic goal of trying to
reduce unnecessary wakeups, but I think the rest of the patch is
taking a bit of a one-size-fits-all approach where I think that we
might want to be more nuanced. I think there are a couple of different
kinds of cases here.

OK, so you disagree with (3) and probably (4). No problem.

What about (1)? That directly affects the powersave capability. I
didn't read anything specific to that.

If we don't fix (1) as well, the changes for startup and walreceiver
will be ineffective for powersaving.

What changes will be acceptable for bgwriter, walwriter and logical worker?

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_startup.v5.patchapplication/octet-stream; name=hibernate_startup.v5.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a48973b3c..b2018cad0e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4519,9 +4519,12 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         <listitem>
          <para>
           Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
+          standby, though this may take as long as 60s to take effect.
+          This mechanism is now deprecated and will be removed in a later
+          release. The preferred mechanism by which to promote the standby 
+          is by using <command>pg_ctl promote</command> or calling
+          <function>pg_promote()</function>, both of which promote much
+          faster than using this parameter.
           This parameter can only be set in the <filename>postgresql.conf</filename>
           file or on the server command line.
          </para>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 81fa26f985..9174dc9318 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  Use of
+    <varname>promote_trigger_file</varname> is deprecated. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,12 +1483,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    Use of <varname>promote_trigger_file</varname> is deprecated. If you're
     setting up the reporting servers that are only used to offload read-only
     queries from the primary, not for high availability purposes, you don't
     need to promote it.
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 8d2395dae2..fcb028e4a3 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3713,14 +3713,17 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					}
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver.  Hibernate, but time out
+					 * to react to a trigger file.  Direct use of trigger file
+					 * is now deprecated and the promote_trigger_file will be
+					 * removed in a later release.
 					 */
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |
 									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 60000L,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f70f7f5c01..b55e5b6216 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3995,7 +3995,8 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
-			NULL
+			NULL,
+			GUC_NOT_IN_SAMPLE
 		},
 		&PromoteTriggerFile,
 		"",
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4cf5b26a36..e61a361e80 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -325,7 +325,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
#21Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#20)
Re: Reducing power consumption on idle servers

On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

What about (1)? That directly affects the powersave capability. I
didn't read anything specific to that.

If we don't fix (1) as well, the changes for startup and walreceiver
will be ineffective for powersaving.

What changes will be acceptable for bgwriter, walwriter and logical worker?

Hmm, I think it would be fine to introduce some kind of hibernation
mechanism for logical workers. bgwriter and wal writer already have a
hibernation mechanism, so I'm not sure what your concern is there
exactly. In your initial email you said you weren't proposing changes
there, but maybe that changed somewhere in the course of the
subsequent discussion. If you could catch me up on your present
thinking that would be helpful.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

#22Ian Lawrence Barwick
barwick@gmail.com
In reply to: Simon Riggs (#20)
Re: Reducing power consumption on idle servers

2022年3月25日(金) 1:03 Simon Riggs <simon.riggs@enterprisedb.com>:

On Thu, 24 Mar 2022 at 15:39, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The proposals of this patch are the following, each of which can be
independently accepted/rejected:
1. fix the sleep pattern of bgwriter, walwriter and logical worker
(directly affects powersave)
2. deprecate promote_trigger_file, which then allows us to fix the
sleep for startup process (directly affects powersave)
3. treat hibernation in all procs the same, for simplicity, and to
make sure we don't forget one later
4. provide a design pattern for background worker extensions to
follow, so as to encourage powersaving

Unfortunately, the patch isn't split in a way that corresponds to this
division. I think I'm +1 on change #2 -- deprecating
promote_trigger_file seems like a good idea to me independently of any
power-saving considerations. But I'm not sure that I am on board with
any of the other changes.

OK, so change (2) is good. Separate patch attached.

Hi

cfbot reports the patch no longer applies [1]http://cfbot.cputube.org/patch_40_3566.log. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

[1]: http://cfbot.cputube.org/patch_40_3566.log

Thnks

Ian Barwick

#23Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Simon Riggs (#20)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Thu, 24 Mar 2022 at 16:02, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Thu, 24 Mar 2022 at 15:39, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The proposals of this patch are the following, each of which can be
independently accepted/rejected:
1. fix the sleep pattern of bgwriter, walwriter and logical worker
(directly affects powersave)
2. deprecate promote_trigger_file, which then allows us to fix the
sleep for startup process (directly affects powersave)
3. treat hibernation in all procs the same, for simplicity, and to
make sure we don't forget one later
4. provide a design pattern for background worker extensions to
follow, so as to encourage powersaving

Unfortunately, the patch isn't split in a way that corresponds to this
division. I think I'm +1 on change #2 -- deprecating
promote_trigger_file seems like a good idea to me independently of any
power-saving considerations. But I'm not sure that I am on board with
any of the other changes.

OK, so change (2) is good. Separate patch attached.

Thanks to Ian for prompting me to pick up this thread again; apologies
for getting distracted.

The attached patch is a reduced version of the original. It covers only:
* deprecation of the promote_trigger_file - there are no tests that
use that, hence why there is no test coverage for the patch
* changing the sleep time of the startup process to 60s
* docs and comments

Other points will be discussed in another branch of this thread.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_startup.v6.patchapplication/octet-stream; name=hibernate_startup.v6.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 559eb898a9..391bcb4306 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4615,9 +4615,12 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         <listitem>
          <para>
           Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
+          standby, though this may take as long as 60s to take effect.
+          This mechanism is now deprecated and will be removed in a later
+          release. The preferred mechanism by which to promote the standby
+          is by using <command>pg_ctl promote</command> or calling
+          <function>pg_promote()</function>, both of which promote much
+          faster than using this parameter.
           This parameter can only be set in the <filename>postgresql.conf</filename>
           file or on the server command line.
          </para>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b2b3129397..d85788338c 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  Use of
+    <varname>promote_trigger_file</varname> is deprecated. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,12 +1483,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    Use of <varname>promote_trigger_file</varname> is deprecated. If you're
     setting up the reporting servers that are only used to offload read-only
     queries from the primary, not for high availability purposes, you don't
     need to promote it.
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..2ba81e91eb 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3840,14 +3840,17 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					XLogPrefetcherComputeStats(xlogprefetcher);
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver.  Hibernate, but time out
+					 * to react to a trigger file.  Direct use of trigger file
+					 * is now deprecated and the promote_trigger_file will be
+					 * removed in a later release.
 					 */
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |
 									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 60000L,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 836b49484a..38aca5dac4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3802,7 +3802,8 @@ struct config_string ConfigureNamesString[] =
 	{
 		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
-			NULL
+			NULL,
+			GUC_NOT_IN_SAMPLE
 		},
 		&PromoteTriggerFile,
 		"",
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..043864597f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -331,7 +331,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
#24Thomas Munro
thomas.munro@gmail.com
In reply to: Simon Riggs (#23)
Re: Reducing power consumption on idle servers

On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The attached patch is a reduced version of the original. It covers only:
* deprecation of the promote_trigger_file - there are no tests that
use that, hence why there is no test coverage for the patch
* changing the sleep time of the startup process to 60s
* docs and comments

LPGTM. If we also fix the bogus SIGALRM wakeups[1]/messages/by-id/CALj2ACUiYn+ZmPGUVmGeoY1u7ino2qsvqrnufk8sWPvK3A8yJA@mail.gmail.com, then finally a
completely idle recovery process looks like:

kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0)
kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0)
kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0)

Presumably it would have no timeout at all in the next release.

[1]: /messages/by-id/CALj2ACUiYn+ZmPGUVmGeoY1u7ino2qsvqrnufk8sWPvK3A8yJA@mail.gmail.com

#25Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Thomas Munro (#24)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Sun, 13 Nov 2022 at 21:28, Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The attached patch is a reduced version of the original. It covers only:
* deprecation of the promote_trigger_file - there are no tests that
use that, hence why there is no test coverage for the patch
* changing the sleep time of the startup process to 60s
* docs and comments

LPGTM. If we also fix the bogus SIGALRM wakeups[1], then finally a
completely idle recovery process looks like:

kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0)
kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0)
kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0)

Presumably it would have no timeout at all in the next release.

[1] /messages/by-id/CALj2ACUiYn+ZmPGUVmGeoY1u7ino2qsvqrnufk8sWPvK3A8yJA@mail.gmail.com

Clearly, I haven't been watching Hackers! Thanks for the nudge.

See if this does the trick?

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_startup.v7.patchapplication/octet-stream; name=hibernate_startup.v7.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 559eb898a9..391bcb4306 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4615,9 +4615,12 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         <listitem>
          <para>
           Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
+          standby, though this may take as long as 60s to take effect.
+          This mechanism is now deprecated and will be removed in a later
+          release. The preferred mechanism by which to promote the standby
+          is by using <command>pg_ctl promote</command> or calling
+          <function>pg_promote()</function>, both of which promote much
+          faster than using this parameter.
           This parameter can only be set in the <filename>postgresql.conf</filename>
           file or on the server command line.
          </para>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b2b3129397..d85788338c 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  Use of
+    <varname>promote_trigger_file</varname> is deprecated. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,12 +1483,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    Use of <varname>promote_trigger_file</varname> is deprecated. If you're
     setting up the reporting servers that are only used to offload read-only
     queries from the primary, not for high availability purposes, you don't
     need to promote it.
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..2dbe1477ac 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3840,14 +3840,21 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					XLogPrefetcherComputeStats(xlogprefetcher);
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver.  Hibernate, but time out
+					 * to react to a trigger file.  Direct use of trigger file
+					 * is now deprecated and the promote_trigger_file will be
+					 * removed in a later release.
+					 * Disable the startup progress timeout, since we aren't
+					 * actually doing anything that needs to be reported on.
 					 */
+					disable_startup_progress_timeout();
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |
 									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 60000L,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
+					enable_startup_progress_timeout();
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index f99186eab7..28926d0899 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -335,6 +335,28 @@ begin_startup_progress_phase(void)
 						 log_startup_progress_interval);
 }
 
+void
+disable_startup_progress_timeout(void)
+{
+	disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);
+	startup_progress_timer_expired = false;
+}
+
+void
+enable_startup_progress_timeout(void)
+{
+	TimestampTz fin_time;
+
+	if (log_startup_progress_interval == 0)
+		return;
+
+	startup_progress_phase_start_time = GetCurrentTimestamp();
+	fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time,
+										   log_startup_progress_interval);
+	enable_timeout_every(STARTUP_PROGRESS_TIMEOUT, fin_time,
+						 log_startup_progress_interval);
+}
+
 /*
  * Report whether startup progress timeout has occurred. Reset the timer flag
  * if it did, set the elapsed time to the out parameters and return true,
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 836b49484a..38aca5dac4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3802,7 +3802,8 @@ struct config_string ConfigureNamesString[] =
 	{
 		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
-			NULL
+			NULL,
+			GUC_NOT_IN_SAMPLE
 		},
 		&PromoteTriggerFile,
 		"",
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..043864597f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -331,7 +331,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
diff --git a/src/include/postmaster/startup.h b/src/include/postmaster/startup.h
index d66ec1fcb1..5f0b7b7d69 100644
--- a/src/include/postmaster/startup.h
+++ b/src/include/postmaster/startup.h
@@ -32,6 +32,8 @@ extern void PostRestoreCommand(void);
 extern bool IsPromoteSignaled(void);
 extern void ResetPromoteSignaled(void);
 
+extern void disable_startup_progress_timeout(void);
+extern void enable_startup_progress_timeout(void);
 extern void begin_startup_progress_phase(void);
 extern void startup_progress_timeout_handler(void);
 extern bool has_startup_progress_timeout_expired(long *secs, int *usecs);
#26Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Simon Riggs (#25)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Sun, 13 Nov 2022 at 23:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Sun, 13 Nov 2022 at 21:28, Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The attached patch is a reduced version of the original. It covers only:
* deprecation of the promote_trigger_file - there are no tests that
use that, hence why there is no test coverage for the patch
* changing the sleep time of the startup process to 60s
* docs and comments

LPGTM. If we also fix the bogus SIGALRM wakeups[1], then finally a
completely idle recovery process looks like:

kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0)
kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0)
kevent(8,0x0,0,{ },1,{ 60.000000000 }) = 0 (0x0)

Presumably it would have no timeout at all in the next release.

[1] /messages/by-id/CALj2ACUiYn+ZmPGUVmGeoY1u7ino2qsvqrnufk8sWPvK3A8yJA@mail.gmail.com

Clearly, I haven't been watching Hackers! Thanks for the nudge.

See if this does the trick?

Looks like the SIGALRM wakeups will be silenced on the other thread,
so we can just revert back to using v6 of this patch.

Reposting v6 now so that patch tester doesn't think this has failed
when the patch on other thread gets applied.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_startup.v6.patchapplication/octet-stream; name=hibernate_startup.v6.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 559eb898a9..391bcb4306 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4615,9 +4615,12 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         <listitem>
          <para>
           Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
+          standby, though this may take as long as 60s to take effect.
+          This mechanism is now deprecated and will be removed in a later
+          release. The preferred mechanism by which to promote the standby
+          is by using <command>pg_ctl promote</command> or calling
+          <function>pg_promote()</function>, both of which promote much
+          faster than using this parameter.
           This parameter can only be set in the <filename>postgresql.conf</filename>
           file or on the server command line.
          </para>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b2b3129397..d85788338c 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  Use of
+    <varname>promote_trigger_file</varname> is deprecated. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,12 +1483,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    Use of <varname>promote_trigger_file</varname> is deprecated. If you're
     setting up the reporting servers that are only used to offload read-only
     queries from the primary, not for high availability purposes, you don't
     need to promote it.
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..2ba81e91eb 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3840,14 +3840,17 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					XLogPrefetcherComputeStats(xlogprefetcher);
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver.  Hibernate, but time out
+					 * to react to a trigger file.  Direct use of trigger file
+					 * is now deprecated and the promote_trigger_file will be
+					 * removed in a later release.
 					 */
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |
 									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 60000L,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 836b49484a..38aca5dac4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3802,7 +3802,8 @@ struct config_string ConfigureNamesString[] =
 	{
 		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
-			NULL
+			NULL,
+			GUC_NOT_IN_SAMPLE
 		},
 		&PromoteTriggerFile,
 		"",
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..043864597f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -331,7 +331,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
#27Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Simon Riggs (#26)
Re: Reducing power consumption on idle servers

On Wed, Nov 16, 2022 at 2:34 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

Reposting v6 now so that patch tester doesn't think this has failed
when the patch on other thread gets applied.

Intention of the patch, that is, to get rid of promote_trigger_file
GUC sometime in future, looks good to me. However, the timeout change
to 60 sec from 5 sec seems far-reaching. While it discourages the use
of the GUC, it can impact many existing production servers that still
rely on promote_trigger_file as it can increase the failover times
impacting SLAs around failover.

How about retaining 5 sec as-is and adding a WARNING in
promote_trigger_file check/assign and in show GUC, in
CheckForStandbyTrigger() whenever PromoteTriggerFile is detected and
specifying about the depreciation in GUC's description?

+                     * to react to a trigger file.  Direct use of trigger file
+                     * is now deprecated and the promote_trigger_file will be
+                     * removed in a later release.
I think, adding 'Direct use of trigger file .....' in a next line that
starts with XXX (typically, this represents a TODO item) is good, no?

Also, do we need to add a TODO in postgresql wiki
(https://wiki.postgresql.org/wiki/Todo), possibly under a new section
'Deprecated Features' or 'Features To-Be-Removed In Near Future' or
some other (hm, it seems too vague, but it starts to track such
deprecated items), to not miss on removing the promote_trigger_file in
future releases?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#28Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Bharath Rupireddy (#27)
Re: Reducing power consumption on idle servers

On Wed, 16 Nov 2022 at 12:47, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Nov 16, 2022 at 2:34 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

Reposting v6 now so that patch tester doesn't think this has failed
when the patch on other thread gets applied.

Intention of the patch, that is, to get rid of promote_trigger_file
GUC sometime in future, looks good to me. However, the timeout change
to 60 sec from 5 sec seems far-reaching. While it discourages the use
of the GUC, it can impact many existing production servers that still
rely on promote_trigger_file as it can increase the failover times
impacting SLAs around failover.

The purpose of 60s is to allow for power reduction, so 5s won't do.

promote_trigger_file is not tested and there are better ways, so
deprecating it in this release is fine.

Anyone that relies on it can update their mechanisms to a supported
one with a one-line change. Realistically, anyone using it won't be on
the latest release anyway, at least for a long time, since if they use
manual methods then they are well behind the times.

--
Simon Riggs http://www.EnterpriseDB.com/

#29Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Simon Riggs (#28)
Re: Reducing power consumption on idle servers

On Wed, Nov 16, 2022 at 8:35 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

On Wed, 16 Nov 2022 at 12:47, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Nov 16, 2022 at 2:34 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

Reposting v6 now so that patch tester doesn't think this has failed
when the patch on other thread gets applied.

Intention of the patch, that is, to get rid of promote_trigger_file
GUC sometime in future, looks good to me. However, the timeout change
to 60 sec from 5 sec seems far-reaching. While it discourages the use
of the GUC, it can impact many existing production servers that still
rely on promote_trigger_file as it can increase the failover times
impacting SLAs around failover.

The purpose of 60s is to allow for power reduction, so 5s won't do.

I understand. I know it's a bit hard to measure the power savings, I'm
wondering if there's any info, maybe not necessarily related to
postgres, but in general how much power gets saved if a certain number
of waits/polls/system calls are reduced.

promote_trigger_file is not tested and there are better ways, so
deprecating it in this release is fine.

Hm, but..

Anyone that relies on it can update their mechanisms to a supported
one with a one-line change. Realistically, anyone using it won't be on
the latest release anyway, at least for a long time, since if they use
manual methods then they are well behind the times.

I may be overly pessimistic here - the change from 5 sec to 60 sec for
detecting promote_trigger_file will have a direct impact on failovers
I believe. After upgrading to the version with 60 sec waiting,
there'll be an increase in failover times if the developers miss to
see the deprecation info about promote_trigger_file. I'm not sure
what's the best way to discourage using promote_trigger_file. Maybe
the change from 5 sec to 60 sec is the way. I'm not really sure.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#30Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Bharath Rupireddy (#29)
Re: Reducing power consumption on idle servers

On Thu, 17 Nov 2022 at 07:36, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

promote_trigger_file is not tested and there are better ways, so
deprecating it in this release is fine.

Hm, but..

Anyone that relies on it can update their mechanisms to a supported
one with a one-line change. Realistically, anyone using it won't be on
the latest release anyway, at least for a long time, since if they use
manual methods then they are well behind the times.

I may be overly pessimistic here - the change from 5 sec to 60 sec for
detecting promote_trigger_file will have a direct impact on failovers
I believe.

No, it will have a direct effect only on people using promote_trigger_file
who do not read and act upon the deprecation notice before upgrading
by making a one line change to their failover scripts.

Since pretty much everyone doing HA uses external HA software (cloud
or otherwise) this shouldn't affect anyone.

--
Simon Riggs http://www.EnterpriseDB.com/

#31Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#30)
Re: Reducing power consumption on idle servers

On Thu, Nov 17, 2022 at 2:55 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

No, it will have a direct effect only on people using promote_trigger_file
who do not read and act upon the deprecation notice before upgrading
by making a one line change to their failover scripts.

TBH, I wonder if we shouldn't just nuke promote_trigger_file.
pg_promote was added in 2018, and pg_ctl promote was added in 2011. So
even if we haven't said promote_trigger_file was deprecated, it hasn't
been the state-of-the-art way of doing this in a really long time. If
we think that there are still a lot of people using it, or if popular
tools are relying on it, then perhaps a deprecation period is
warranted anyway. But I think we should at least consider the
possibility that it's OK to just get rid of it right away.

--
Robert Haas
EDB: http://www.enterprisedb.com

#32Andres Freund
andres@anarazel.de
In reply to: Bharath Rupireddy (#29)
Re: Reducing power consumption on idle servers

Hi,

On 2022-11-17 13:06:23 +0530, Bharath Rupireddy wrote:

I understand. I know it's a bit hard to measure the power savings, I'm
wondering if there's any info, maybe not necessarily related to
postgres, but in general how much power gets saved if a certain number
of waits/polls/system calls are reduced.

It's heavily hardware and hardware settings dependent.

On some systems you can get an *approximate* idea of the power usage even
while plugged in. On others you can only see it while on battery power.

On systems with RAPL support (most Intel and I think newer AMD CPUs) you can
query power usage with:
powerstat -R 1 (adding -D provides a bit more detail)

But note that RAPL typically severely undercounts power usage because it
doesn't cover a lot of sources of power usage (display, sometimes memory,
all other peripherals).

Sometimes powerstat -R -D can split power usage up more granularly,
e.g. between the different CPU sockets and memory.

On laptops you can often measure the discharge rate when not plugged in, with
powerstat -d 0 1. But IME the latency till the values update makes it harder
to interpret.

On some workstations / servers you can read the power usage via ACPI. E.g. on
my workstation 'sensors' shows a power_meter-acpi-0

As an example of the difference it can make, here's the output of
powerstat -D 1
on my laptop.

Time User Nice Sys Idle IO Run Ctxt/s IRQ/s Fork Exec Exit Watts
16:41:55 0.1 0.0 0.1 99.8 0.0 1 403 246 1 0 1 2.62
16:41:56 0.1 0.0 0.1 99.8 0.0 1 357 196 1 0 1 2.72
16:41:57 0.1 0.0 0.1 99.6 0.2 1 510 231 4 0 4 2.64
16:41:58 0.1 0.0 0.1 99.9 0.0 2 1350 758 64 62 63 4.06
16:41:59 0.3 0.0 1.0 98.7 0.0 2 4166 2406 244 243 244 7.20
16:42:00 0.2 0.0 0.7 99.1 0.0 2 4203 2353 247 246 247 7.21
16:42:01 0.5 0.0 1.6 98.0 0.0 2 4079 2395 240 239 240 7.08
16:42:02 0.5 0.0 0.9 98.7 0.0 2 4097 2405 245 243 245 7.20
16:42:03 0.4 0.0 1.3 98.3 0.0 2 4117 2311 243 242 243 7.14
16:42:04 0.1 0.0 0.4 99.4 0.1 1 1721 1152 70 70 71 4.48
16:42:05 0.1 0.0 0.2 99.8 0.0 1 433 250 1 0 1 2.92
16:42:06 0.0 0.0 0.3 99.7 0.0 1 400 231 1 0 1 2.66

In the period of higher power etc usage I ran
while true;do sleep 0.001;done
and then interupted that after a bit with ctrl-c.

Same thing on my workstation (:

Time User Nice Sys Idle IO Run Ctxt/s IRQ/s Fork Exec Exit Watts
16:43:48 1.0 0.0 0.2 98.7 0.1 1 8218 2354 0 0 0 46.43
16:43:49 1.1 0.0 0.3 98.7 0.0 1 7866 2477 0 0 0 45.99
16:43:50 1.1 0.0 0.4 98.5 0.0 2 7753 2996 0 0 0 48.93
16:43:51 0.8 0.0 1.7 97.5 0.0 1 9395 5285 0 0 0 75.48
16:43:52 0.5 0.0 1.7 97.8 0.0 1 9141 4806 0 0 0 75.30
16:43:53 1.1 0.0 1.8 97.1 0.0 2 10065 5504 0 0 0 76.27
16:43:54 1.3 0.0 1.5 97.2 0.0 1 10962 5165 0 0 0 76.33
16:43:55 0.9 0.0 0.8 98.3 0.0 1 8452 3939 0 0 0 61.99
16:43:56 0.6 0.0 0.1 99.3 0.0 2 6541 1999 0 0 0 40.92
16:43:57 0.9 0.0 0.2 98.9 0.0 2 8199 2477 0 0 0 42.91

And if I query the power supply via ACPI instead:

while true;do sensors power_meter-acpi-0|grep power1|awk '{print $2, $3}';sleep 1;done
...
163.00 W
173.00 W
173.00 W
172.00 W
203.00 W
206.00 W
206.00 W
206.00 W
209.00 W
205.00 W
211.00 W
213.00 W
203.00 W
175.00 W
166.00 W
...

As you can see the difference is quite substantial. This is solely due to a
1ms sleep loop (albeit one where we fork a process after each sleep, which
likely is a good chunk of the CPU usage).

However, the difference in power usage will be far smaller if the system
already busy, because the CPU will already run at a high frequency.

Greetings,

Andres Freund

#33Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Andres Freund (#32)
Re: Reducing power consumption on idle servers

On Fri, Nov 18, 2022 at 6:29 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-11-17 13:06:23 +0530, Bharath Rupireddy wrote:

I understand. I know it's a bit hard to measure the power savings, I'm
wondering if there's any info, maybe not necessarily related to
postgres, but in general how much power gets saved if a certain number
of waits/polls/system calls are reduced.

It's heavily hardware and hardware settings dependent.

On some systems you can get an *approximate* idea of the power usage even
while plugged in. On others you can only see it while on battery power.

On systems with RAPL support (most Intel and I think newer AMD CPUs) you can
query power usage with:
powerstat -R 1 (adding -D provides a bit more detail)

But note that RAPL typically severely undercounts power usage because it
doesn't cover a lot of sources of power usage (display, sometimes memory,
all other peripherals).

Sometimes powerstat -R -D can split power usage up more granularly,
e.g. between the different CPU sockets and memory.

On laptops you can often measure the discharge rate when not plugged in, with
powerstat -d 0 1. But IME the latency till the values update makes it harder
to interpret.

On some workstations / servers you can read the power usage via ACPI. E.g. on
my workstation 'sensors' shows a power_meter-acpi-0

As an example of the difference it can make, here's the output of
powerstat -D 1
on my laptop.

Time User Nice Sys Idle IO Run Ctxt/s IRQ/s Fork Exec Exit Watts
16:41:55 0.1 0.0 0.1 99.8 0.0 1 403 246 1 0 1 2.62
16:41:56 0.1 0.0 0.1 99.8 0.0 1 357 196 1 0 1 2.72
16:41:57 0.1 0.0 0.1 99.6 0.2 1 510 231 4 0 4 2.64
16:41:58 0.1 0.0 0.1 99.9 0.0 2 1350 758 64 62 63 4.06
16:41:59 0.3 0.0 1.0 98.7 0.0 2 4166 2406 244 243 244 7.20
16:42:00 0.2 0.0 0.7 99.1 0.0 2 4203 2353 247 246 247 7.21
16:42:01 0.5 0.0 1.6 98.0 0.0 2 4079 2395 240 239 240 7.08
16:42:02 0.5 0.0 0.9 98.7 0.0 2 4097 2405 245 243 245 7.20
16:42:03 0.4 0.0 1.3 98.3 0.0 2 4117 2311 243 242 243 7.14
16:42:04 0.1 0.0 0.4 99.4 0.1 1 1721 1152 70 70 71 4.48
16:42:05 0.1 0.0 0.2 99.8 0.0 1 433 250 1 0 1 2.92
16:42:06 0.0 0.0 0.3 99.7 0.0 1 400 231 1 0 1 2.66

In the period of higher power etc usage I ran
while true;do sleep 0.001;done
and then interupted that after a bit with ctrl-c.

Same thing on my workstation (:

Time User Nice Sys Idle IO Run Ctxt/s IRQ/s Fork Exec Exit Watts
16:43:48 1.0 0.0 0.2 98.7 0.1 1 8218 2354 0 0 0 46.43
16:43:49 1.1 0.0 0.3 98.7 0.0 1 7866 2477 0 0 0 45.99
16:43:50 1.1 0.0 0.4 98.5 0.0 2 7753 2996 0 0 0 48.93
16:43:51 0.8 0.0 1.7 97.5 0.0 1 9395 5285 0 0 0 75.48
16:43:52 0.5 0.0 1.7 97.8 0.0 1 9141 4806 0 0 0 75.30
16:43:53 1.1 0.0 1.8 97.1 0.0 2 10065 5504 0 0 0 76.27
16:43:54 1.3 0.0 1.5 97.2 0.0 1 10962 5165 0 0 0 76.33
16:43:55 0.9 0.0 0.8 98.3 0.0 1 8452 3939 0 0 0 61.99
16:43:56 0.6 0.0 0.1 99.3 0.0 2 6541 1999 0 0 0 40.92
16:43:57 0.9 0.0 0.2 98.9 0.0 2 8199 2477 0 0 0 42.91

And if I query the power supply via ACPI instead:

while true;do sensors power_meter-acpi-0|grep power1|awk '{print $2, $3}';sleep 1;done
...
163.00 W
173.00 W
173.00 W
172.00 W
203.00 W
206.00 W
206.00 W
206.00 W
209.00 W
205.00 W
211.00 W
213.00 W
203.00 W
175.00 W
166.00 W
...

As you can see the difference is quite substantial. This is solely due to a
1ms sleep loop (albeit one where we fork a process after each sleep, which
likely is a good chunk of the CPU usage).

However, the difference in power usage will be far smaller if the system
already busy, because the CPU will already run at a high frequency.

Thanks a lot for sharing the details.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#34Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Robert Haas (#31)
Re: Reducing power consumption on idle servers

On Fri, Nov 18, 2022 at 2:08 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 17, 2022 at 2:55 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

No, it will have a direct effect only on people using promote_trigger_file
who do not read and act upon the deprecation notice before upgrading
by making a one line change to their failover scripts.

TBH, I wonder if we shouldn't just nuke promote_trigger_file.
pg_promote was added in 2018, and pg_ctl promote was added in 2011. So
even if we haven't said promote_trigger_file was deprecated, it hasn't
been the state-of-the-art way of doing this in a really long time. If
we think that there are still a lot of people using it, or if popular
tools are relying on it, then perhaps a deprecation period is
warranted anyway. But I think we should at least consider the
possibility that it's OK to just get rid of it right away.

I agree with Robert here. It's better to do away with
promote_trigger_file, at least that's better than deprecating it now
and removing it somewhere in later releases. However, I'm a bit
worried about how it'll affect the tools/providers/extensions that
depend on it. And it turns out that this worry exists for every
feature that one thinks to get rid of.

If we go the route of doing away with promote_trigger_file, I think we
need to discuss it in a separate thread as the subject of this thread
doesn't match with what we're discussing here. This way, we'll get
thoughts from other hackers.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#35Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Bharath Rupireddy (#34)
Re: Reducing power consumption on idle servers

On Fri, 18 Nov 2022 at 08:55, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

However, I'm a bit
worried about how it'll affect the tools/providers/extensions that
depend on it.

Who is that? Which ones depend upon it?

--
Simon Riggs http://www.EnterpriseDB.com/

#36Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Robert Haas (#31)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Thu, 17 Nov 2022 at 20:38, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 17, 2022 at 2:55 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

No, it will have a direct effect only on people using promote_trigger_file
who do not read and act upon the deprecation notice before upgrading
by making a one line change to their failover scripts.

TBH, I wonder if we shouldn't just nuke promote_trigger_file.
pg_promote was added in 2018, and pg_ctl promote was added in 2011. So
even if we haven't said promote_trigger_file was deprecated, it hasn't
been the state-of-the-art way of doing this in a really long time. If
we think that there are still a lot of people using it, or if popular
tools are relying on it, then perhaps a deprecation period is
warranted anyway. But I think we should at least consider the
possibility that it's OK to just get rid of it right away.

I agree. I can't see a reason to keep it anymore.

I'm nervous about not having any wakeup at all, but since we are
removing the parameter there is no other reason not to do as Andres
suggests.

New version attached, which assumes that the SIGALRMs are silenced on
the other thread.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_startup.v8.patchapplication/octet-stream; name=hibernate_startup.v8.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bd50ea8e48..211dfc27ac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4610,24 +4610,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         </listitem>
        </varlistentry>
 
-       <varlistentry id="guc-promote-trigger-file" xreflabel="promote_trigger_file">
-        <term><varname>promote_trigger_file</varname> (<type>string</type>)
-        <indexterm>
-          <primary><varname>promote_trigger_file</varname> configuration parameter</primary>
-        </indexterm>
-        </term>
-        <listitem>
-         <para>
-          Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
-          This parameter can only be set in the <filename>postgresql.conf</filename>
-          file or on the server command line.
-         </para>
-        </listitem>
-       </varlistentry>
-
      <varlistentry id="guc-hot-standby" xreflabel="hot_standby">
       <term><varname>hot_standby</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b2b3129397..d85788338c 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  Use of
+    <varname>promote_trigger_file</varname> is deprecated. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,12 +1483,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    Use of <varname>promote_trigger_file</varname> is deprecated. If you're
     setting up the reporting servers that are only used to offload read-only
     queries from the primary, not for high availability purposes, you don't
     need to promote it.
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..41414ec669 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -95,7 +95,6 @@ int			recovery_min_apply_delay = 0;
 /* options formerly taken from recovery.conf for XLOG streaming */
 char	   *PrimaryConnInfo = NULL;
 char	   *PrimarySlotName = NULL;
-char	   *PromoteTriggerFile = NULL;
 bool		wal_receiver_create_temp_slot = false;
 
 /*
@@ -3840,14 +3839,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					XLogPrefetcherComputeStats(xlogprefetcher);
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver. Use of trigger file
+					 * via promote_trigger_file is now deprecated.
 					 */
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
-									 WL_LATCH_SET | WL_TIMEOUT |
-									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 WL_LATCH_SET | WL_EXIT_ON_PM_DEATH,
+									 -1L,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
@@ -4300,8 +4299,6 @@ SetPromoteIsTriggered(void)
 static bool
 CheckForStandbyTrigger(void)
 {
-	struct stat stat_buf;
-
 	if (LocalPromoteIsTriggered)
 		return true;
 
@@ -4314,23 +4311,6 @@ CheckForStandbyTrigger(void)
 		return true;
 	}
 
-	if (PromoteTriggerFile == NULL || strcmp(PromoteTriggerFile, "") == 0)
-		return false;
-
-	if (stat(PromoteTriggerFile, &stat_buf) == 0)
-	{
-		ereport(LOG,
-				(errmsg("promote trigger file found: %s", PromoteTriggerFile)));
-		unlink(PromoteTriggerFile);
-		SetPromoteIsTriggered();
-		return true;
-	}
-	else if (errno != ENOENT)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not stat promote trigger file \"%s\": %m",
-						PromoteTriggerFile)));
-
 	return false;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 836b49484a..2cefdee581 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3799,16 +3799,6 @@ struct config_string ConfigureNamesString[] =
 		check_recovery_target_lsn, assign_recovery_target_lsn, NULL
 	},
 
-	{
-		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
-			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
-			NULL
-		},
-		&PromoteTriggerFile,
-		"",
-		NULL, NULL, NULL
-	},
-
 	{
 		{"primary_conninfo", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the connection string to be used to connect to the sending server."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..043864597f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -331,7 +331,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
#37Thomas Munro
thomas.munro@gmail.com
In reply to: Simon Riggs (#36)
Re: Reducing power consumption on idle servers

On Sat, Nov 19, 2022 at 7:54 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

I agree. I can't see a reason to keep it anymore.

+ Use of <varname>promote_trigger_file</varname> is deprecated. If you're

I think 'deprecated' usually implies that it still works but you
should avoid it. I think you need something stronger.

I'm nervous about not having any wakeup at all, but since we are
removing the parameter there is no other reason not to do as Andres
suggests.

Why? If we're accidentally relying on this timeout for recovery to
not hang in some situation, that's a bug waiting to be discovered and
fixed and it won't be this patch's fault.

New version attached, which assumes that the SIGALRMs are silenced on
the other thread.

I tested this + Bharath's v5 from the other thread. meson test
passes, and tracing the recovery process shows that it is indeed,
finally, completely idle. Huzzah!

#38Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Thomas Munro (#37)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Fri, 18 Nov 2022 at 20:26, Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Nov 19, 2022 at 7:54 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

I agree. I can't see a reason to keep it anymore.

+ Use of <varname>promote_trigger_file</varname> is deprecated. If you're

I think 'deprecated' usually implies that it still works but you
should avoid it. I think you need something stronger.

Whisky? Or maybe just reword the sentence...

New version attached.

I'm nervous about not having any wakeup at all, but since we are
removing the parameter there is no other reason not to do as Andres
suggests.

Why? If we're accidentally relying on this timeout for recovery to
not hang in some situation, that's a bug waiting to be discovered and
fixed and it won't be this patch's fault.

New version attached, which assumes that the SIGALRMs are silenced on
the other thread.

I tested this + Bharath's v5 from the other thread. meson test
passes, and tracing the recovery process shows that it is indeed,
finally, completely idle. Huzzah!

Thanks for testing!

Finally completely idle? Good to achieve that.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_startup.v9.patchapplication/octet-stream; name=hibernate_startup.v9.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bd50ea8e48..211dfc27ac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4610,24 +4610,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         </listitem>
        </varlistentry>
 
-       <varlistentry id="guc-promote-trigger-file" xreflabel="promote_trigger_file">
-        <term><varname>promote_trigger_file</varname> (<type>string</type>)
-        <indexterm>
-          <primary><varname>promote_trigger_file</varname> configuration parameter</primary>
-        </indexterm>
-        </term>
-        <listitem>
-         <para>
-          Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
-          This parameter can only be set in the <filename>postgresql.conf</filename>
-          file or on the server command line.
-         </para>
-        </listitem>
-       </varlistentry>
-
      <varlistentry id="guc-hot-standby" xreflabel="hot_standby">
       <term><varname>hot_standby</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b2b3129397..cac8c71a44 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  The parameter
+    <varname>promote_trigger_file</varname> has been removed. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,15 +1483,11 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
-    setting up the reporting servers that are only used to offload read-only
-    queries from the primary, not for high availability purposes, you don't
-    need to promote it.
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    The parameter <varname>promote_trigger_file</varname> has been removed.
+    If you're setting up the reporting servers that are only used to offload
+    read-only queries from the primary, not for high availability purposes,
+    you don't need to promote it.
    </para>
   </sect1>
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..b1008c3701 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -95,7 +95,6 @@ int			recovery_min_apply_delay = 0;
 /* options formerly taken from recovery.conf for XLOG streaming */
 char	   *PrimaryConnInfo = NULL;
 char	   *PrimarySlotName = NULL;
-char	   *PromoteTriggerFile = NULL;
 bool		wal_receiver_create_temp_slot = false;
 
 /*
@@ -3840,14 +3839,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					XLogPrefetcherComputeStats(xlogprefetcher);
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver. Use of trigger file
+					 * via promote_trigger_file is now fully removed.
 					 */
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
-									 WL_LATCH_SET | WL_TIMEOUT |
-									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 WL_LATCH_SET | WL_EXIT_ON_PM_DEATH,
+									 -1L,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
@@ -4300,8 +4299,6 @@ SetPromoteIsTriggered(void)
 static bool
 CheckForStandbyTrigger(void)
 {
-	struct stat stat_buf;
-
 	if (LocalPromoteIsTriggered)
 		return true;
 
@@ -4314,23 +4311,6 @@ CheckForStandbyTrigger(void)
 		return true;
 	}
 
-	if (PromoteTriggerFile == NULL || strcmp(PromoteTriggerFile, "") == 0)
-		return false;
-
-	if (stat(PromoteTriggerFile, &stat_buf) == 0)
-	{
-		ereport(LOG,
-				(errmsg("promote trigger file found: %s", PromoteTriggerFile)));
-		unlink(PromoteTriggerFile);
-		SetPromoteIsTriggered();
-		return true;
-	}
-	else if (errno != ENOENT)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not stat promote trigger file \"%s\": %m",
-						PromoteTriggerFile)));
-
 	return false;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 836b49484a..2cefdee581 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3799,16 +3799,6 @@ struct config_string ConfigureNamesString[] =
 		check_recovery_target_lsn, assign_recovery_target_lsn, NULL
 	},
 
-	{
-		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
-			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
-			NULL
-		},
-		&PromoteTriggerFile,
-		"",
-		NULL, NULL, NULL
-	},
-
 	{
 		{"primary_conninfo", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the connection string to be used to connect to the sending server."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..043864597f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -331,7 +331,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
#39Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Simon Riggs (#38)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Sat, 19 Nov 2022 at 10:59, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

New version attached.

Fix for doc xref

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_startup.v10.patchapplication/octet-stream; name=hibernate_startup.v10.patchDownload
diff --git a/doc/src/sgml/appendix-obsolete-recovery-config.sgml b/doc/src/sgml/appendix-obsolete-recovery-config.sgml
index 1cf4913114..8ca519b5f1 100644
--- a/doc/src/sgml/appendix-obsolete-recovery-config.sgml
+++ b/doc/src/sgml/appendix-obsolete-recovery-config.sgml
@@ -35,13 +35,8 @@
 
    <para>
     The
-    <literal>trigger_file</literal>
-    <indexterm>
-     <primary>trigger_file</primary>
-     <see>promote_trigger_file</see>
-    </indexterm>
-    setting has been renamed to
-    <xref linkend="guc-promote-trigger-file"/>.
+    <literal>trigger_file</literal> and <literal>promote_trigger_file</literal>
+    have been removed.
    </para>
 
    <para>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bd50ea8e48..211dfc27ac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4610,24 +4610,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         </listitem>
        </varlistentry>
 
-       <varlistentry id="guc-promote-trigger-file" xreflabel="promote_trigger_file">
-        <term><varname>promote_trigger_file</varname> (<type>string</type>)
-        <indexterm>
-          <primary><varname>promote_trigger_file</varname> configuration parameter</primary>
-        </indexterm>
-        </term>
-        <listitem>
-         <para>
-          Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
-          This parameter can only be set in the <filename>postgresql.conf</filename>
-          file or on the server command line.
-         </para>
-        </listitem>
-       </varlistentry>
-
      <varlistentry id="guc-hot-standby" xreflabel="hot_standby">
       <term><varname>hot_standby</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b2b3129397..cac8c71a44 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  The parameter
+    <varname>promote_trigger_file</varname> has been removed. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,15 +1483,11 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
-    setting up the reporting servers that are only used to offload read-only
-    queries from the primary, not for high availability purposes, you don't
-    need to promote it.
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    The parameter <varname>promote_trigger_file</varname> has been removed.
+    If you're setting up the reporting servers that are only used to offload
+    read-only queries from the primary, not for high availability purposes,
+    you don't need to promote it.
    </para>
   </sect1>
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..b1008c3701 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -95,7 +95,6 @@ int			recovery_min_apply_delay = 0;
 /* options formerly taken from recovery.conf for XLOG streaming */
 char	   *PrimaryConnInfo = NULL;
 char	   *PrimarySlotName = NULL;
-char	   *PromoteTriggerFile = NULL;
 bool		wal_receiver_create_temp_slot = false;
 
 /*
@@ -3840,14 +3839,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					XLogPrefetcherComputeStats(xlogprefetcher);
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver. Use of trigger file
+					 * via promote_trigger_file is now fully removed.
 					 */
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
-									 WL_LATCH_SET | WL_TIMEOUT |
-									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 WL_LATCH_SET | WL_EXIT_ON_PM_DEATH,
+									 -1L,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
@@ -4300,8 +4299,6 @@ SetPromoteIsTriggered(void)
 static bool
 CheckForStandbyTrigger(void)
 {
-	struct stat stat_buf;
-
 	if (LocalPromoteIsTriggered)
 		return true;
 
@@ -4314,23 +4311,6 @@ CheckForStandbyTrigger(void)
 		return true;
 	}
 
-	if (PromoteTriggerFile == NULL || strcmp(PromoteTriggerFile, "") == 0)
-		return false;
-
-	if (stat(PromoteTriggerFile, &stat_buf) == 0)
-	{
-		ereport(LOG,
-				(errmsg("promote trigger file found: %s", PromoteTriggerFile)));
-		unlink(PromoteTriggerFile);
-		SetPromoteIsTriggered();
-		return true;
-	}
-	else if (errno != ENOENT)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not stat promote trigger file \"%s\": %m",
-						PromoteTriggerFile)));
-
 	return false;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 836b49484a..2cefdee581 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3799,16 +3799,6 @@ struct config_string ConfigureNamesString[] =
 		check_recovery_target_lsn, assign_recovery_target_lsn, NULL
 	},
 
-	{
-		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
-			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
-			NULL
-		},
-		&PromoteTriggerFile,
-		"",
-		NULL, NULL, NULL
-	},
-
 	{
 		{"primary_conninfo", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the connection string to be used to connect to the sending server."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..043864597f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -331,7 +331,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
#40Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Robert Haas (#21)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Thu, 24 Mar 2022 at 16:21, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs

What changes will be acceptable for bgwriter, walwriter and logical worker?

Hmm, I think it would be fine to introduce some kind of hibernation
mechanism for logical workers. bgwriter and wal writer already have a
hibernation mechanism, so I'm not sure what your concern is there
exactly. In your initial email you said you weren't proposing changes
there, but maybe that changed somewhere in the course of the
subsequent discussion. If you could catch me up on your present
thinking that would be helpful.

Now that we seem to have solved the problem for Startup process, let's
circle back to the others....

Bgwriter does hibernate currently, but only at 50x the bgwriter_delay.
At default values that is 5s, but could be much less. So we need to
move that up to 60s, same as others.

WALwriter also hibernates currently, but only at 25x the
wal_writer_delay. At default values that is 2.5s, but could be much
less. So we need to move that up to 60s, same as others. At the same
time, make sure that when we hibernate we use a new WaitEvent,
similarly to the way bgwriter reports its hibernation state (which
also helps test the patch).

As a 3rd patch, I will work on making logical workers hibernate.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_bgwriter_walwriter.v5.patchapplication/octet-stream; name=hibernate_bgwriter_walwriter.v5.patchDownload
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 91e6f6ea18..36414973e2 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -60,12 +60,6 @@
  */
 int			BgWriterDelay = 200;
 
-/*
- * Multiplier to apply to BgWriterDelay when we decide to hibernate.
- * (Perhaps this needs to be configurable?)
- */
-#define HIBERNATE_FACTOR			50
-
 /*
  * Interval in which standby snapshots are logged into the WAL stream, in
  * milliseconds.
@@ -335,7 +329,7 @@ BackgroundWriterMain(void)
 			/* Sleep ... */
 			(void) WaitLatch(MyLatch,
 							 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-							 BgWriterDelay * HIBERNATE_FACTOR,
+							 HIBERNATE_DELAY_MS,
 							 WAIT_EVENT_BGWRITER_HIBERNATE);
 			/* Reset the notification request in case we timed out */
 			StrategyNotifyBgWriter(-1);
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index beb46dcb55..08ab783c1f 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -70,14 +70,6 @@
 int			WalWriterDelay = 200;
 int			WalWriterFlushAfter = 128;
 
-/*
- * Number of do-nothing loops before lengthening the delay time, and the
- * multiplier to apply to WalWriterDelay when we do decide to hibernate.
- * (Perhaps these need to be configurable?)
- */
-#define LOOPS_UNTIL_HIBERNATE		50
-#define HIBERNATE_FACTOR			25
-
 /* Prototypes for private functions */
 static void HandleWalWriterInterrupts(void);
 
@@ -225,8 +217,6 @@ WalWriterMain(void)
 	 */
 	for (;;)
 	{
-		long		cur_timeout;
-
 		/*
 		 * Advertise whether we might hibernate in this cycle.  We do this
 		 * before resetting the latch to ensure that any async commits will
@@ -266,14 +256,18 @@ WalWriterMain(void)
 		 * sleep time so as to reduce the server's idle power consumption.
 		 */
 		if (left_till_hibernate > 0)
-			cur_timeout = WalWriterDelay;	/* in ms */
-		else
-			cur_timeout = WalWriterDelay * HIBERNATE_FACTOR;
+		{
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+							 WalWriterDelay,   /* in ms */
+							 WAIT_EVENT_WAL_WRITER_MAIN);
+			continue;
+		}
 
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-						 cur_timeout,
-						 WAIT_EVENT_WAL_WRITER_MAIN);
+						 HIBERNATE_DELAY_MS,
+						 WAIT_EVENT_WAL_WRITER_HIBERNATE);
 	}
 }
 
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 92f24a6c9b..b4eaba737f 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -245,6 +245,9 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_WAL_WRITER_MAIN:
 			event_name = "WalWriterMain";
 			break;
+		case WAIT_EVENT_WAL_WRITER_HIBERNATE:
+			event_name = "WalWriterHibernate";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 68ab740f16..f0a86ede3a 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -140,6 +140,11 @@ typedef struct Latch
 							 WL_SOCKET_CONNECTED | \
 							 WL_SOCKET_CLOSED)
 
+/* Hibernation */
+#define HIBERNATE_DELAY_SEC		60
+#define HIBERNATE_DELAY_MS		(1000L * HIBERNATE_DELAY_SEC)
+#define LOOPS_UNTIL_HIBERNATE	50
+
 typedef struct WaitEvent
 {
 	int			pos;			/* position in the event data structure */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 6f2d5612e0..d8a77eabd1 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -46,7 +46,8 @@ typedef enum
 	WAIT_EVENT_SYSLOGGER_MAIN,
 	WAIT_EVENT_WAL_RECEIVER_MAIN,
 	WAIT_EVENT_WAL_SENDER_MAIN,
-	WAIT_EVENT_WAL_WRITER_MAIN
+	WAIT_EVENT_WAL_WRITER_MAIN,
+	WAIT_EVENT_WAL_WRITER_HIBERNATE
 } WaitEventActivity;
 
 /* ----------
#41Thomas Munro
thomas.munro@gmail.com
In reply to: Simon Riggs (#39)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Mon, Nov 21, 2022 at 3:35 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

On Sat, 19 Nov 2022 at 10:59, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

New version attached.

Fix for doc xref

I removed a stray variable declaration from xlogrecovery.h, and wrote
a draft commit message. I'll wait 24 hours before committing, to
provide a last chance for anyone who wants to complain about dropping
promote_trigger_file.

(We could of course change it so that the timeout based wakeup only
happens if you have actually set promote_trigger_file, but I think
we've established that we just don't want the filesystem polling
feature so I'm whispering this in parentheses.)

Attachments:

v11-0001-Remove-promote_trigger_file.patchtext/x-patch; charset=US-ASCII; name=v11-0001-Remove-promote_trigger_file.patchDownload
From 6a6db8862ceb6618fb5f98c6245dc7d834fe89db Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 21 Nov 2022 09:37:47 +1300
Subject: [PATCH v11] Remove promote_trigger_file.

Previously an otherwise idle startup (recovery) process would wake up
every 5 seconds to have a chance to poll for promote_trigger_file, even
if that GUC was not configured.  That signaling mechanism was superseded
by pg_ctl promote and pg_promote() a long time ago.  There probably
aren't many users left and it's very easy to change to the modern
mechanisms, so let's remove that feature.

This is part of a campaign to reduce wakeups on idle systems.

Author: Simon Riggs <simon.riggs@enterprisedb.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CANbhV-FsjnzVOQGBpQ589%3DnWuL1Ex0Ykn74Nh1hEjp2usZSR5g%40mail.gmail.com
---
 .../appendix-obsolete-recovery-config.sgml    |  9 ++----
 doc/src/sgml/config.sgml                      | 18 -----------
 doc/src/sgml/high-availability.sgml           | 24 ++++++--------
 src/backend/access/transam/xlogrecovery.c     | 32 ++++---------------
 src/backend/utils/misc/guc_tables.c           | 10 ------
 src/backend/utils/misc/postgresql.conf.sample |  1 -
 src/include/access/xlogrecovery.h             |  1 -
 7 files changed, 18 insertions(+), 77 deletions(-)

diff --git a/doc/src/sgml/appendix-obsolete-recovery-config.sgml b/doc/src/sgml/appendix-obsolete-recovery-config.sgml
index 1cf4913114..8ca519b5f1 100644
--- a/doc/src/sgml/appendix-obsolete-recovery-config.sgml
+++ b/doc/src/sgml/appendix-obsolete-recovery-config.sgml
@@ -35,13 +35,8 @@
 
    <para>
     The
-    <literal>trigger_file</literal>
-    <indexterm>
-     <primary>trigger_file</primary>
-     <see>promote_trigger_file</see>
-    </indexterm>
-    setting has been renamed to
-    <xref linkend="guc-promote-trigger-file"/>.
+    <literal>trigger_file</literal> and <literal>promote_trigger_file</literal>
+    have been removed.
    </para>
 
    <para>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bd50ea8e48..211dfc27ac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4610,24 +4610,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         </listitem>
        </varlistentry>
 
-       <varlistentry id="guc-promote-trigger-file" xreflabel="promote_trigger_file">
-        <term><varname>promote_trigger_file</varname> (<type>string</type>)
-        <indexterm>
-          <primary><varname>promote_trigger_file</varname> configuration parameter</primary>
-        </indexterm>
-        </term>
-        <listitem>
-         <para>
-          Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
-          This parameter can only be set in the <filename>postgresql.conf</filename>
-          file or on the server command line.
-         </para>
-        </listitem>
-       </varlistentry>
-
      <varlistentry id="guc-hot-standby" xreflabel="hot_standby">
       <term><varname>hot_standby</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b2b3129397..cac8c71a44 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  The parameter
+    <varname>promote_trigger_file</varname> has been removed. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,15 +1483,11 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
-    setting up the reporting servers that are only used to offload read-only
-    queries from the primary, not for high availability purposes, you don't
-    need to promote it.
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    The parameter <varname>promote_trigger_file</varname> has been removed.
+    If you're setting up the reporting servers that are only used to offload
+    read-only queries from the primary, not for high availability purposes,
+    you don't need to promote it.
    </para>
   </sect1>
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..b1008c3701 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -95,7 +95,6 @@ int			recovery_min_apply_delay = 0;
 /* options formerly taken from recovery.conf for XLOG streaming */
 char	   *PrimaryConnInfo = NULL;
 char	   *PrimarySlotName = NULL;
-char	   *PromoteTriggerFile = NULL;
 bool		wal_receiver_create_temp_slot = false;
 
 /*
@@ -3840,14 +3839,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					XLogPrefetcherComputeStats(xlogprefetcher);
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver. Use of trigger file
+					 * via promote_trigger_file is now fully removed.
 					 */
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
-									 WL_LATCH_SET | WL_TIMEOUT |
-									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 WL_LATCH_SET | WL_EXIT_ON_PM_DEATH,
+									 -1L,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
@@ -4300,8 +4299,6 @@ SetPromoteIsTriggered(void)
 static bool
 CheckForStandbyTrigger(void)
 {
-	struct stat stat_buf;
-
 	if (LocalPromoteIsTriggered)
 		return true;
 
@@ -4314,23 +4311,6 @@ CheckForStandbyTrigger(void)
 		return true;
 	}
 
-	if (PromoteTriggerFile == NULL || strcmp(PromoteTriggerFile, "") == 0)
-		return false;
-
-	if (stat(PromoteTriggerFile, &stat_buf) == 0)
-	{
-		ereport(LOG,
-				(errmsg("promote trigger file found: %s", PromoteTriggerFile)));
-		unlink(PromoteTriggerFile);
-		SetPromoteIsTriggered();
-		return true;
-	}
-	else if (errno != ENOENT)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not stat promote trigger file \"%s\": %m",
-						PromoteTriggerFile)));
-
 	return false;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 836b49484a..2cefdee581 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3799,16 +3799,6 @@ struct config_string ConfigureNamesString[] =
 		check_recovery_target_lsn, assign_recovery_target_lsn, NULL
 	},
 
-	{
-		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
-			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
-			NULL
-		},
-		&PromoteTriggerFile,
-		"",
-		NULL, NULL, NULL
-	},
-
 	{
 		{"primary_conninfo", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the connection string to be used to connect to the sending server."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..043864597f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -331,7 +331,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h
index 0e3e246bd2..f3398425d8 100644
--- a/src/include/access/xlogrecovery.h
+++ b/src/include/access/xlogrecovery.h
@@ -65,7 +65,6 @@ extern PGDLLIMPORT TimestampTz recoveryTargetTime;
 extern PGDLLIMPORT const char *recoveryTargetName;
 extern PGDLLIMPORT XLogRecPtr recoveryTargetLSN;
 extern PGDLLIMPORT RecoveryTargetType recoveryTarget;
-extern PGDLLIMPORT char *PromoteTriggerFile;
 extern PGDLLIMPORT bool wal_receiver_create_temp_slot;
 extern PGDLLIMPORT RecoveryTargetTimeLineGoal recoveryTargetTimeLineGoal;
 extern PGDLLIMPORT TimeLineID recoveryTargetTLIRequested;
-- 
2.38.1

#42Thomas Munro
thomas.munro@gmail.com
In reply to: Simon Riggs (#40)
Re: Reducing power consumption on idle servers

On Mon, Nov 21, 2022 at 9:00 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

As a 3rd patch, I will work on making logical workers hibernate.

Duelling patch warning: Nathan mentioned[1]/messages/by-id/CA+hUKGJGhX4r2LPUE3Oy9BX71Eum6PBcS8L3sJpScR9oKaTVaA@mail.gmail.com that he's hacking on a
patch for that, along the lines of the recent walreceiver change IIUC.

[1]: /messages/by-id/CA+hUKGJGhX4r2LPUE3Oy9BX71Eum6PBcS8L3sJpScR9oKaTVaA@mail.gmail.com

#43Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#42)
Re: Reducing power consumption on idle servers

On Mon, Nov 21, 2022 at 10:31:15AM +1300, Thomas Munro wrote:

On Mon, Nov 21, 2022 at 9:00 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

As a 3rd patch, I will work on making logical workers hibernate.

Duelling patch warning: Nathan mentioned[1] that he's hacking on a
patch for that, along the lines of the recent walreceiver change IIUC.

I coded something up last week, but, like the walreceiver patch, it caused
check-world to take much longer [0]/messages/by-id/20221113222644.GA1269110@nathanxps13, and I haven't looked into whether it
could be easily fixed. I'm hoping to make some time for this again in the
near future.

[0]: /messages/by-id/20221113222644.GA1269110@nathanxps13

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#44Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Thomas Munro (#41)
Re: Reducing power consumption on idle servers

On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:

I'll wait 24 hours before committing, to
provide a last chance for anyone who wants to complain about dropping
promote_trigger_file.

Remove "promote_trigger_file"? Now I have never seen anybody use that
parameter, but I don't think that it is a good idea to deviate from our
usual standard of deprecating a feature for about five years before
actually removing it.

Yours,
Laurenz Albe

#45Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Laurenz Albe (#44)
Re: Reducing power consumption on idle servers

On Mon, Nov 21, 2022 at 10:37 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:

I'll wait 24 hours before committing, to
provide a last chance for anyone who wants to complain about dropping
promote_trigger_file.

Remove "promote_trigger_file"? Now I have never seen anybody use that
parameter, but I don't think that it is a good idea to deviate from our
usual standard of deprecating a feature for about five years before
actually removing it.

I'm not sure what the guidelines are here, however years have gone by
since pg_ctl promote [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4695da5ae97bbb58d274887fd68edbe88d03ebcb in 2011 and pg_promote() [2]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=10074651e3355e2405015f6253602be8344bc829 in 2018 were
added. With two such alternatives in place for many years, it was sort
of an undeclared deprecation of promote_trigger_file GUC. And the
changes required to move to newer ways from the GUC aren't that hard
for those who're still relying on the GUC. Therefore, I think it's now
time for us to do away with the GUC.

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4695da5ae97bbb58d274887fd68edbe88d03ebcb
[2]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=10074651e3355e2405015f6253602be8344bc829

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#46Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Thomas Munro (#41)
Re: Reducing power consumption on idle servers

On Mon, Nov 21, 2022 at 2:43 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Nov 21, 2022 at 3:35 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

On Sat, 19 Nov 2022 at 10:59, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

New version attached.

Fix for doc xref

I removed a stray variable declaration from xlogrecovery.h, and wrote
a draft commit message. I'll wait 24 hours before committing, to
provide a last chance for anyone who wants to complain about dropping
promote_trigger_file.

(We could of course change it so that the timeout based wakeup only
happens if you have actually set promote_trigger_file, but I think
we've established that we just don't want the filesystem polling
feature so I'm whispering this in parentheses.)

Thanks. The v11 patch mostly looks good to me and it passes cirrus-ci
tests - https://github.com/BRupireddy/postgres/tree/do_away_with_promote_trigger_file.

I have a comment:

-                     * Wait for more WAL to arrive. Time out after 5 seconds
-                     * to react to a trigger file promptly and to check if the
-                     * WAL receiver is still active.
+                     * Wait for more WAL to arrive, when we will be woken
+                     * immediately by the WAL receiver. Use of trigger file
+                     * via promote_trigger_file is now fully removed.
                      */
Do we need to introduce reference to removal of promote_trigger_file
in the code? If left as-is, it'll lie there for many years. Isn't it
enough to specify in appendix-obsolete-recovery-config.sgml?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#47Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Laurenz Albe (#44)
Re: Reducing power consumption on idle servers

On Mon, 21 Nov 2022 at 05:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:

I'll wait 24 hours before committing, to
provide a last chance for anyone who wants to complain about dropping
promote_trigger_file.

Remove "promote_trigger_file"? Now I have never seen anybody use that
parameter, but I don't think that it is a good idea to deviate from our
usual standard of deprecating a feature for about five years before
actually removing it.

We aren't removing the ability to promote, just enforcing a change to
a better mechanism, hence I don't see a reason for a long(er)
deprecation period than we have already had.

--
Simon Riggs http://www.EnterpriseDB.com/

#48Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Nathan Bossart (#43)
Re: Reducing power consumption on idle servers

On Sun, 20 Nov 2022 at 22:55, Nathan Bossart <nathandbossart@gmail.com> wrote:

On Mon, Nov 21, 2022 at 10:31:15AM +1300, Thomas Munro wrote:

On Mon, Nov 21, 2022 at 9:00 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

As a 3rd patch, I will work on making logical workers hibernate.

Duelling patch warning: Nathan mentioned[1] that he's hacking on a
patch for that, along the lines of the recent walreceiver change IIUC.

I coded something up last week, but, like the walreceiver patch, it caused
check-world to take much longer [0], and I haven't looked into whether it
could be easily fixed. I'm hoping to make some time for this again in the
near future.

[0] /messages/by-id/20221113222644.GA1269110@nathanxps13

OK, Nathan, will leave this one to you - remembering that we need to
fix ALL processes to get a useful power reduction when idle.

--
Simon Riggs http://www.EnterpriseDB.com/

#49Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bharath Rupireddy (#45)
Re: Reducing power consumption on idle servers

On Mon, 2022-11-21 at 11:42 +0530, Bharath Rupireddy wrote:

On Mon, Nov 21, 2022 at 10:37 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:

I'll wait 24 hours before committing, to
provide a last chance for anyone who wants to complain about dropping
promote_trigger_file.

Remove "promote_trigger_file"?  Now I have never seen anybody use that
parameter, but I don't think that it is a good idea to deviate from our
usual standard of deprecating a feature for about five years before
actually removing it.

I'm not sure what the guidelines are here, however years have gone by
since pg_ctl promote [1] in 2011 and pg_promote() [2] in 2018 were
added. With two such alternatives in place for many years, it was sort
of an undeclared deprecation of promote_trigger_file GUC. And the
changes required to move to newer ways from the GUC aren't that hard
for those who're still relying on the GUC. Therefore, I think it's now
time for us to do away with the GUC.

I disagree. With the same argument, you could rip out "serial", since we
have supported identity columns since v11.

I understand the desire to avoid needless wakeups, but isn't it possible
to only perform the regular poll if "promote_trigger_file" is set?

Yours,
Laurenz Albe

#50Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Simon Riggs (#47)
Re: Reducing power consumption on idle servers

On Mon, 2022-11-21 at 07:36 +0000, Simon Riggs wrote:

On Mon, 21 Nov 2022 at 05:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:

I'll wait 24 hours before committing, to
provide a last chance for anyone who wants to complain about dropping
promote_trigger_file.

Remove "promote_trigger_file"?  Now I have never seen anybody use that
parameter, but I don't think that it is a good idea to deviate from our
usual standard of deprecating a feature for about five years before
actually removing it.

We aren't removing the ability to promote, just enforcing a change to
a better mechanism, hence I don't see a reason for a long(er)
deprecation period than we have already had.

We have had a deprecation period? I looked at the documentation, but found
no mention of a deprecation. How hard can it be to leave the GUC and only
poll for the existence of the file if it is set?

I personally don't need the GUC, and I know nobody who does, but I think
we should not be cavalier about introducing unnecessary compatibility breaks.

Yours,
Laurenz Albe

#51Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Laurenz Albe (#50)
Re: Reducing power consumption on idle servers

On Mon, 21 Nov 2022 at 08:40, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Mon, 2022-11-21 at 07:36 +0000, Simon Riggs wrote:

On Mon, 21 Nov 2022 at 05:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:

I'll wait 24 hours before committing, to
provide a last chance for anyone who wants to complain about dropping
promote_trigger_file.

Remove "promote_trigger_file"? Now I have never seen anybody use that
parameter, but I don't think that it is a good idea to deviate from our
usual standard of deprecating a feature for about five years before
actually removing it.

We aren't removing the ability to promote, just enforcing a change to
a better mechanism, hence I don't see a reason for a long(er)
deprecation period than we have already had.

We have had a deprecation period? I looked at the documentation, but found
no mention of a deprecation. How hard can it be to leave the GUC and only
poll for the existence of the file if it is set?

I personally don't need the GUC, and I know nobody who does,

Nobody else does either.

I disagree. With the same argument, you could rip out "serial", since we
have supported identity columns since v11.

...and this is not a user facing change, only HA systems interface with this.

but I think
we should not be cavalier about introducing unnecessary compatibility breaks.

I agree we should not be cavalier, nor has anyone been so. The first
version of the patch was cautious on this very point, but since many
people think we should remove it, it is not a user facing feature and
nobody on this thread knows anybody or anything that uses it, I have
changed my mind and now think we should remove it.

We have two versions to choose from now
* v6 deprecates this option
* v10 removes this option
so it is no effort at all to choose either option.

The main issue is about reducing power usage, so please let's move to
commit something soon, one way or another.

--
Simon Riggs http://www.EnterpriseDB.com/

#52Robert Haas
robertmhaas@gmail.com
In reply to: Laurenz Albe (#50)
Re: Reducing power consumption on idle servers

On Mon, Nov 21, 2022 at 3:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

We have had a deprecation period? I looked at the documentation, but found
no mention of a deprecation. How hard can it be to leave the GUC and only
poll for the existence of the file if it is set?

I personally don't need the GUC, and I know nobody who does, but I think
we should not be cavalier about introducing unnecessary compatibility breaks.

As a project, we have a problem with deprecation warnings, simply
because community consensus changes over time as people change their
minds and as participants come and go. Slapping a deprecation notice
on things saying that they will be removed in five years does not mean
they will actually get removed in five years, nor does the failure to
slap a deprecation notice on them mean that they won't be removed in
five years. We have some instances of following through on such
promises, but plenty where what we said we were going to do and what
we actually did were quite different. Furthermore, even if we were
100% on follow-through on such things, I have no faith in the idea
that a mandatory five-year deprecation period in all cases would be
good for the project.

I think it's totally reasonable to phase out a little-used feature
that no one really cares about faster than a feature that is
heavily-relied upon and for which there are no reasonable
alternatives. Something like standard_conforming_strings or the switch
from md5 to SCRAM authentication affects not only database users but
also drivers and applications; time needs to be allowed for the effect
to seep through the whole ecosystem. Backward compatibility is really
important for the transition period. In this case, though, the
alternatives have already existed for a long time and we have reasons
to think most people have already been using them for a long time.
Moreover, if they haven't been, the transition shouldn't be too hard.
For there to be a problem here, we have to imagine a user who not
can't run pg_promote() or pg_ctl promote, who can only create a file,
and who furthermore needs that file to be located someplace other than
the data directory. I'm having a hard time envisioning that.

The reason that I pushed back -- not as successfully as I would have
liked -- on the changes to pg_stop_backup / pg_start_backup is that I
know there are people using the old method successfully, and it's not
just a 1:1 substitution. Here I don't, and it is. I'm totally open to
the feedback that such people exist and to hearing why adopting one of
the newer methods would be a problem for them, if that's the case. But
if there's no evidence that such people exist or that changing is a
problem for them, I don't think waiting 5 years on principle is good
for the project.

P.S. One notable example of how bad we are at these things is that
contrib/xml2 has been deprecated since PostgreSQL 8.3. The deprecation
notice says that it isn't needed any more because all of the
functionality is otherwise available, but some people think that's not
true so we haven't removed the module. An obvious alternative would be
to remove the deprecation notice but FWICT other people think that
getting rid of it is the right idea so we haven't done that either.
Whee.

--
Robert Haas
EDB: http://www.enterprisedb.com

#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#52)
Re: Reducing power consumption on idle servers

Robert Haas <robertmhaas@gmail.com> writes:

The reason that I pushed back -- not as successfully as I would have
liked -- on the changes to pg_stop_backup / pg_start_backup is that I
know there are people using the old method successfully, and it's not
just a 1:1 substitution. Here I don't, and it is. I'm totally open to
the feedback that such people exist and to hearing why adopting one of
the newer methods would be a problem for them, if that's the case. But
if there's no evidence that such people exist or that changing is a
problem for them, I don't think waiting 5 years on principle is good
for the project.

We make incompatible changes in every release; see the release notes.
Unless somebody can give a plausible use-case where this'd be a
difficult change to deal with, I concur that we don't need to
deprecate it ahead of time.

regards, tom lane

#54Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#53)
Re: Reducing power consumption on idle servers

On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

The reason that I pushed back -- not as successfully as I would have
liked -- on the changes to pg_stop_backup / pg_start_backup is that I
know there are people using the old method successfully, and it's not
just a 1:1 substitution. Here I don't, and it is. I'm totally open to
the feedback that such people exist and to hearing why adopting one of
the newer methods would be a problem for them, if that's the case. But
if there's no evidence that such people exist or that changing is a
problem for them, I don't think waiting 5 years on principle is good
for the project.

We make incompatible changes in every release; see the release notes.
Unless somebody can give a plausible use-case where this'd be a
difficult change to deal with, I concur that we don't need to
deprecate it ahead of time.

Since I am the only one that seems to worry, I'll shut up. You are probably
right that it the feature won't be missed by many users.

Yours,
Laurenz Albe

#55Ian Lawrence Barwick
barwick@gmail.com
In reply to: Laurenz Albe (#54)
Re: Reducing power consumption on idle servers

2022年11月22日(火) 5:50 Laurenz Albe <laurenz.albe@cybertec.at>:

On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

The reason that I pushed back -- not as successfully as I would have
liked -- on the changes to pg_stop_backup / pg_start_backup is that I
know there are people using the old method successfully, and it's not
just a 1:1 substitution. Here I don't, and it is. I'm totally open to
the feedback that such people exist and to hearing why adopting one of
the newer methods would be a problem for them, if that's the case. But
if there's no evidence that such people exist or that changing is a
problem for them, I don't think waiting 5 years on principle is good
for the project.

We make incompatible changes in every release; see the release notes.
Unless somebody can give a plausible use-case where this'd be a
difficult change to deal with, I concur that we don't need to
deprecate it ahead of time.

Since I am the only one that seems to worry, I'll shut up. You are probably
right that it the feature won't be missed by many users.

FWIW, though I prefer to err on the side of caution when removing features
from anything, I am struggling to remember ever having used
"promote_trigger_file"
(or "trigger_file" as it was in Pg11 and earlier); grepping my private notes
brings up a single example from ca. 2012 when I appear to have been
experimenting with replication.

On a quick web search, a large part of the results are related to its change
to a GUC in Pg 12 and/or commented out entries in sample postgresql.conf files;
most of the rest seem to be blog articles/tutorials on setting up replication.

Regards

Ian Barwick

#56Thomas Munro
thomas.munro@gmail.com
In reply to: Ian Lawrence Barwick (#55)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Sun, Nov 27, 2022 at 6:15 PM Ian Lawrence Barwick <barwick@gmail.com> wrote:

2022年11月22日(火) 5:50 Laurenz Albe <laurenz.albe@cybertec.at>:

On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

The reason that I pushed back -- not as successfully as I would have
liked -- on the changes to pg_stop_backup / pg_start_backup is that I
know there are people using the old method successfully, and it's not
just a 1:1 substitution. Here I don't, and it is. I'm totally open to
the feedback that such people exist and to hearing why adopting one of
the newer methods would be a problem for them, if that's the case. But
if there's no evidence that such people exist or that changing is a
problem for them, I don't think waiting 5 years on principle is good
for the project.

We make incompatible changes in every release; see the release notes.
Unless somebody can give a plausible use-case where this'd be a
difficult change to deal with, I concur that we don't need to
deprecate it ahead of time.

Since I am the only one that seems to worry, I'll shut up. You are probably
right that it the feature won't be missed by many users.

FWIW, though I prefer to err on the side of caution when removing features
from anything, I am struggling to remember ever having used
"promote_trigger_file"
(or "trigger_file" as it was in Pg11 and earlier); grepping my private notes
brings up a single example from ca. 2012 when I appear to have been
experimenting with replication.

On a quick web search, a large part of the results are related to its change
to a GUC in Pg 12 and/or commented out entries in sample postgresql.conf files;
most of the rest seem to be blog articles/tutorials on setting up replication.

Thanks Ian, Laurenz, Robert, Tom for the discussion. Seems like we're
all set then. Sorry for delaying over trivialities, but I have a
couple more comments about the documentation before committing:

"The trigger_file and promote_trigger_file have been removed." was
missing some words. I've also added a sentence to say which releases
were involved, to make it like nearby paragraphs about other obsolete
stuff.

The funny thing about that "obsolete" appendix is that it's intended
as a URL-preserving way to document the recovery.conf file's demise in
r12. It doesn't look like the right place to document ongoing changes
to recovery-related GUCs in general. In this particular case it's
sort of OK because the old trigger_file GUC was indeed in
recovery.conf, so it's not completely irrelevant. So maybe it's OK?
Perhaps in future, in a separate commit, we could have a new appendix
"Obsolete settings" that has an alphabetical list of the fallen?

The main documentation of pg_promote() etc now has "The parameter
<varname>promote_trigger_file</varname> has been removed" in the
places where the GUC was previously mentioned. Perhaps we should just
remove the mentions completely (it's somehow either too much and not
enough information), but I'm OK with leaving those in for now until
some better place exists for historical notes.

So this is the version I will push unless someone sees anything more
to tweak about the documentation.

Attachments:

v12-0001-Remove-promote_trigger_file.patchtext/x-patch; charset=US-ASCII; name=v12-0001-Remove-promote_trigger_file.patchDownload
From 15da25b8c46e3e2a57431784d07bb87c6cce5e9f Mon Sep 17 00:00:00 2001
From: Thomas Munro <tmunro@postgresql.org>
Date: Mon, 28 Nov 2022 10:50:44 +1300
Subject: [PATCH v12] Remove promote_trigger_file.

Previously, an idle startup (recovery) process would wake up every 5
seconds to have a chance to poll for promote_trigger_file, even if that
GUC was not configured.  That promotion triggering mechanism was
effectively superseded by pg_ctl promote and pg_promote() a long time
ago.  There probably aren't many users left and it's very easy to change
to the modern mechanisms, so we agreed to remove the feature.

This is part of a campaign to reduce wakeups on idle systems.

Author: Simon Riggs <simon.riggs@enterprisedb.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CANbhV-FsjnzVOQGBpQ589%3DnWuL1Ex0Ykn74Nh1hEjp2usZSR5g%40mail.gmail.com
---
 .../appendix-obsolete-recovery-config.sgml    | 12 +++----
 doc/src/sgml/config.sgml                      | 18 -----------
 doc/src/sgml/high-availability.sgml           | 24 ++++++--------
 src/backend/access/transam/xlogrecovery.c     | 32 ++++---------------
 src/backend/utils/misc/guc_tables.c           | 10 ------
 src/backend/utils/misc/postgresql.conf.sample |  1 -
 src/include/access/xlogrecovery.h             |  1 -
 7 files changed, 20 insertions(+), 78 deletions(-)

diff --git a/doc/src/sgml/appendix-obsolete-recovery-config.sgml b/doc/src/sgml/appendix-obsolete-recovery-config.sgml
index 1cf4913114..2414b5e9ec 100644
--- a/doc/src/sgml/appendix-obsolete-recovery-config.sgml
+++ b/doc/src/sgml/appendix-obsolete-recovery-config.sgml
@@ -34,14 +34,10 @@
    </para>
 
    <para>
-    The
-    <literal>trigger_file</literal>
-    <indexterm>
-     <primary>trigger_file</primary>
-     <see>promote_trigger_file</see>
-    </indexterm>
-    setting has been renamed to
-    <xref linkend="guc-promote-trigger-file"/>.
+    PostgreSQL 15 and below had a setting
+    <literal>promote_trigger_file</literal>, and in PostgreSQL 12 it was called <literal>trigger_file</literal>.
+    Use <command>pg_ctl promote</command> or call
+    <function>pg_promote()</function> to promote a standby instead.
    </para>
 
    <para>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 24b1624bad..baa9b5e712 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4610,24 +4610,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         </listitem>
        </varlistentry>
 
-       <varlistentry id="guc-promote-trigger-file" xreflabel="promote_trigger_file">
-        <term><varname>promote_trigger_file</varname> (<type>string</type>)
-        <indexterm>
-          <primary><varname>promote_trigger_file</varname> configuration parameter</primary>
-        </indexterm>
-        </term>
-        <listitem>
-         <para>
-          Specifies a trigger file whose presence ends recovery in the
-          standby.  Even if this value is not set, you can still promote
-          the standby using <command>pg_ctl promote</command> or calling
-          <function>pg_promote()</function>.
-          This parameter can only be set in the <filename>postgresql.conf</filename>
-          file or on the server command line.
-         </para>
-        </listitem>
-       </varlistentry>
-
      <varlistentry id="guc-hot-standby" xreflabel="hot_standby">
       <term><varname>hot_standby</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b2b3129397..cac8c71a44 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -653,11 +653,11 @@ protocol to make nodes agree on a serializable transactional order.
 
    <para>
     Standby mode is exited and the server switches to normal operation
-    when <command>pg_ctl promote</command> is run,
-    <function>pg_promote()</function> is called, or a trigger file is found
-    (<varname>promote_trigger_file</varname>). Before failover,
-    any WAL immediately available in the archive or in <filename>pg_wal</filename> will be
-    restored, but no attempt is made to connect to the primary.
+    when <command>pg_ctl promote</command> is run, or
+    <function>pg_promote()</function> is called.  The parameter
+    <varname>promote_trigger_file</varname> has been removed. Before failover,
+    any WAL immediately available in the archive or in <filename>pg_wal</filename>
+    will be restored, but no attempt is made to connect to the primary.
    </para>
   </sect2>
 
@@ -1483,15 +1483,11 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 
    <para>
     To trigger failover of a log-shipping standby server, run
-    <command>pg_ctl promote</command>, call <function>pg_promote()</function>,
-    or create a trigger file with the file name and path specified by the
-    <varname>promote_trigger_file</varname>. If you're planning to use
-    <command>pg_ctl promote</command> or to call
-    <function>pg_promote()</function> to fail over,
-    <varname>promote_trigger_file</varname> is not required. If you're
-    setting up the reporting servers that are only used to offload read-only
-    queries from the primary, not for high availability purposes, you don't
-    need to promote it.
+    <command>pg_ctl promote</command> or call <function>pg_promote()</function>.
+    The parameter <varname>promote_trigger_file</varname> has been removed.
+    If you're setting up the reporting servers that are only used to offload
+    read-only queries from the primary, not for high availability purposes,
+    you don't need to promote it.
    </para>
   </sect1>
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..b1008c3701 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -95,7 +95,6 @@ int			recovery_min_apply_delay = 0;
 /* options formerly taken from recovery.conf for XLOG streaming */
 char	   *PrimaryConnInfo = NULL;
 char	   *PrimarySlotName = NULL;
-char	   *PromoteTriggerFile = NULL;
 bool		wal_receiver_create_temp_slot = false;
 
 /*
@@ -3840,14 +3839,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					XLogPrefetcherComputeStats(xlogprefetcher);
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly and to check if the
-					 * WAL receiver is still active.
+					 * Wait for more WAL to arrive, when we will be woken
+					 * immediately by the WAL receiver. Use of trigger file
+					 * via promote_trigger_file is now fully removed.
 					 */
 					(void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
-									 WL_LATCH_SET | WL_TIMEOUT |
-									 WL_EXIT_ON_PM_DEATH,
-									 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
+									 WL_LATCH_SET | WL_EXIT_ON_PM_DEATH,
+									 -1L,
+									 WAIT_EVENT_RECOVERY_WAL_STREAM);
 					ResetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
 					break;
 				}
@@ -4300,8 +4299,6 @@ SetPromoteIsTriggered(void)
 static bool
 CheckForStandbyTrigger(void)
 {
-	struct stat stat_buf;
-
 	if (LocalPromoteIsTriggered)
 		return true;
 
@@ -4314,23 +4311,6 @@ CheckForStandbyTrigger(void)
 		return true;
 	}
 
-	if (PromoteTriggerFile == NULL || strcmp(PromoteTriggerFile, "") == 0)
-		return false;
-
-	if (stat(PromoteTriggerFile, &stat_buf) == 0)
-	{
-		ereport(LOG,
-				(errmsg("promote trigger file found: %s", PromoteTriggerFile)));
-		unlink(PromoteTriggerFile);
-		SetPromoteIsTriggered();
-		return true;
-	}
-	else if (errno != ENOENT)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not stat promote trigger file \"%s\": %m",
-						PromoteTriggerFile)));
-
 	return false;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 349dd6a537..1bf14eec66 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3819,16 +3819,6 @@ struct config_string ConfigureNamesString[] =
 		check_recovery_target_lsn, assign_recovery_target_lsn, NULL
 	},
 
-	{
-		{"promote_trigger_file", PGC_SIGHUP, REPLICATION_STANDBY,
-			gettext_noop("Specifies a file name whose presence ends recovery in the standby."),
-			NULL
-		},
-		&PromoteTriggerFile,
-		"",
-		NULL, NULL, NULL
-	},
-
 	{
 		{"primary_conninfo", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets the connection string to be used to connect to the sending server."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..043864597f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -331,7 +331,6 @@
 
 #primary_conninfo = ''			# connection string to sending server
 #primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
 #hot_standby = on			# "off" disallows queries during recovery
 					# (change requires restart)
 #max_standby_archive_delay = 30s	# max delay before canceling queries
diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h
index 0e3e246bd2..f3398425d8 100644
--- a/src/include/access/xlogrecovery.h
+++ b/src/include/access/xlogrecovery.h
@@ -65,7 +65,6 @@ extern PGDLLIMPORT TimestampTz recoveryTargetTime;
 extern PGDLLIMPORT const char *recoveryTargetName;
 extern PGDLLIMPORT XLogRecPtr recoveryTargetLSN;
 extern PGDLLIMPORT RecoveryTargetType recoveryTarget;
-extern PGDLLIMPORT char *PromoteTriggerFile;
 extern PGDLLIMPORT bool wal_receiver_create_temp_slot;
 extern PGDLLIMPORT RecoveryTargetTimeLineGoal recoveryTargetTimeLineGoal;
 extern PGDLLIMPORT TimeLineID recoveryTargetTLIRequested;
-- 
2.38.1

#57Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Thomas Munro (#56)
Re: Reducing power consumption on idle servers

On Mon, Nov 28, 2022 at 5:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:

"The trigger_file and promote_trigger_file have been removed." was
missing some words. I've also added a sentence to say which releases
were involved, to make it like nearby paragraphs about other obsolete
stuff.

LGTM.

The funny thing about that "obsolete" appendix is that it's intended
as a URL-preserving way to document the recovery.conf file's demise in
r12. It doesn't look like the right place to document ongoing changes
to recovery-related GUCs in general. In this particular case it's
sort of OK because the old trigger_file GUC was indeed in
recovery.conf, so it's not completely irrelevant. So maybe it's OK?
Perhaps in future, in a separate commit, we could have a new appendix
"Obsolete settings" that has an alphabetical list of the fallen?

Tracking down history to figure out and add all the removed/renamed
settings under a new appendix seems a tedious task. The best is to
look into corresponding release notes to figure out which ones have
been removed/renamed/altered IMO.

FWIW, it's been a while (3 major releases have happened) since
recovery.conf was removed, I think we need to do away with
appendix-obsolete-recovery-config.sgml someday.

The main documentation of pg_promote() etc now has "The parameter
<varname>promote_trigger_file</varname> has been removed" in the
places where the GUC was previously mentioned. Perhaps we should just
remove the mentions completely (it's somehow either too much and not
enough information), but I'm OK with leaving those in for now until
some better place exists for historical notes.

+1 for the way it is in the v12 patch i.e. specifying removal of it in
the docs. However, I don't see any reason to keep the traces of it in
the code, can we remove "Use of trigger file via promote_trigger_file
is now fully removed." sentence in the code below?

+                     * Wait for more WAL to arrive, when we will be woken
+                     * immediately by the WAL receiver. Use of trigger file
+                     * via promote_trigger_file is now fully removed.
                      */

So this is the version I will push unless someone sees anything more
to tweak about the documentation.

Thanks.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#58Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#56)
Re: Reducing power consumption on idle servers

On Sun, Nov 27, 2022 at 6:57 PM Thomas Munro <thomas.munro@gmail.com> wrote:

The main documentation of pg_promote() etc now has "The parameter
<varname>promote_trigger_file</varname> has been removed" in the
places where the GUC was previously mentioned. Perhaps we should just
remove the mentions completely (it's somehow either too much and not
enough information), but I'm OK with leaving those in for now until
some better place exists for historical notes.

I think we should remove those mentions. Otherwise the documentation
just collects mentions of an increasing number of things that are no
longer relevant.

And, as you say, we can't presume that future readers would know what
promote_trigger_file even is.

--
Robert Haas
EDB: http://www.enterprisedb.com

#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#58)
Re: Reducing power consumption on idle servers

Robert Haas <robertmhaas@gmail.com> writes:

I think we should remove those mentions. Otherwise the documentation
just collects mentions of an increasing number of things that are no
longer relevant.

Yeah, I think the same. There will be a release-note entry, and
I don't object to having something about it in appendix-obsolete.sgml;
but we shouldn't clutter the main docs with it.

regards, tom lane

#60Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#59)
Re: Reducing power consumption on idle servers

I found some more comments and some documentation to word-smith very
lightly, and pushed. The comments were stray references to the
trigger file. It's
a little confusing because the remaining mechanism also uses a file,
but it uses a signal first so seems better to refer to promotion
requests rather than files.

/me wonders how many idle servers there are in the world

#61Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Thomas Munro (#60)
Re: Reducing power consumption on idle servers

On Mon, 28 Nov 2022 at 23:16, Thomas Munro <thomas.munro@gmail.com> wrote:

I found some more comments and some documentation to word-smith very
lightly, and pushed.

Thanks

The comments were stray references to the
trigger file. It's
a little confusing because the remaining mechanism also uses a file,
but it uses a signal first so seems better to refer to promotion
requests rather than files.

..and again

/me wonders how many idle servers there are in the world

My estimate is there are 100 million servers in use worldwide, with
only about 1% of them on a continuously busy duty cycle and most of
them not in the cloud.

If we guess that we save 10W when idle, then that saves some proportion of a GW.

It's not a huge contribution to the effort, but if by doing this we
inspire others to do the same, we may yet make a difference.

--
Simon Riggs http://www.EnterpriseDB.com/

#62Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Simon Riggs (#40)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Sun, 20 Nov 2022 at 20:00, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Thu, 24 Mar 2022 at 16:21, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs

What changes will be acceptable for bgwriter, walwriter and logical worker?

Hmm, I think it would be fine to introduce some kind of hibernation
mechanism for logical workers. bgwriter and wal writer already have a
hibernation mechanism, so I'm not sure what your concern is there
exactly. In your initial email you said you weren't proposing changes
there, but maybe that changed somewhere in the course of the
subsequent discussion. If you could catch me up on your present
thinking that would be helpful.

Now that we seem to have solved the problem for Startup process, let's
circle back to the others....

Thanks for committing changes to the startup process.

Bgwriter does hibernate currently, but only at 50x the bgwriter_delay.
At default values that is 5s, but could be much less. So we need to
move that up to 60s, same as others.

WALwriter also hibernates currently, but only at 25x the
wal_writer_delay. At default values that is 2.5s, but could be much
less. So we need to move that up to 60s, same as others. At the same
time, make sure that when we hibernate we use a new WaitEvent,
similarly to the way bgwriter reports its hibernation state (which
also helps test the patch).

Re-attaching patch for bgwriter and walwriter, so it is clear this is
not yet committed.

I've added this to the next CF, since the entry was closed when the
startup patch was committed.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

hibernate_bgwriter_walwriter.v5.patchapplication/octet-stream; name=hibernate_bgwriter_walwriter.v5.patchDownload
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 91e6f6ea18..36414973e2 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -60,12 +60,6 @@
  */
 int			BgWriterDelay = 200;
 
-/*
- * Multiplier to apply to BgWriterDelay when we decide to hibernate.
- * (Perhaps this needs to be configurable?)
- */
-#define HIBERNATE_FACTOR			50
-
 /*
  * Interval in which standby snapshots are logged into the WAL stream, in
  * milliseconds.
@@ -335,7 +329,7 @@ BackgroundWriterMain(void)
 			/* Sleep ... */
 			(void) WaitLatch(MyLatch,
 							 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-							 BgWriterDelay * HIBERNATE_FACTOR,
+							 HIBERNATE_DELAY_MS,
 							 WAIT_EVENT_BGWRITER_HIBERNATE);
 			/* Reset the notification request in case we timed out */
 			StrategyNotifyBgWriter(-1);
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index beb46dcb55..08ab783c1f 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -70,14 +70,6 @@
 int			WalWriterDelay = 200;
 int			WalWriterFlushAfter = 128;
 
-/*
- * Number of do-nothing loops before lengthening the delay time, and the
- * multiplier to apply to WalWriterDelay when we do decide to hibernate.
- * (Perhaps these need to be configurable?)
- */
-#define LOOPS_UNTIL_HIBERNATE		50
-#define HIBERNATE_FACTOR			25
-
 /* Prototypes for private functions */
 static void HandleWalWriterInterrupts(void);
 
@@ -225,8 +217,6 @@ WalWriterMain(void)
 	 */
 	for (;;)
 	{
-		long		cur_timeout;
-
 		/*
 		 * Advertise whether we might hibernate in this cycle.  We do this
 		 * before resetting the latch to ensure that any async commits will
@@ -266,14 +256,18 @@ WalWriterMain(void)
 		 * sleep time so as to reduce the server's idle power consumption.
 		 */
 		if (left_till_hibernate > 0)
-			cur_timeout = WalWriterDelay;	/* in ms */
-		else
-			cur_timeout = WalWriterDelay * HIBERNATE_FACTOR;
+		{
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+							 WalWriterDelay,   /* in ms */
+							 WAIT_EVENT_WAL_WRITER_MAIN);
+			continue;
+		}
 
 		(void) WaitLatch(MyLatch,
 						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-						 cur_timeout,
-						 WAIT_EVENT_WAL_WRITER_MAIN);
+						 HIBERNATE_DELAY_MS,
+						 WAIT_EVENT_WAL_WRITER_HIBERNATE);
 	}
 }
 
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 92f24a6c9b..b4eaba737f 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -245,6 +245,9 @@ pgstat_get_wait_activity(WaitEventActivity w)
 		case WAIT_EVENT_WAL_WRITER_MAIN:
 			event_name = "WalWriterMain";
 			break;
+		case WAIT_EVENT_WAL_WRITER_HIBERNATE:
+			event_name = "WalWriterHibernate";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 68ab740f16..f0a86ede3a 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -140,6 +140,11 @@ typedef struct Latch
 							 WL_SOCKET_CONNECTED | \
 							 WL_SOCKET_CLOSED)
 
+/* Hibernation */
+#define HIBERNATE_DELAY_SEC		60
+#define HIBERNATE_DELAY_MS		(1000L * HIBERNATE_DELAY_SEC)
+#define LOOPS_UNTIL_HIBERNATE	50
+
 typedef struct WaitEvent
 {
 	int			pos;			/* position in the event data structure */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 6f2d5612e0..d8a77eabd1 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -46,7 +46,8 @@ typedef enum
 	WAIT_EVENT_SYSLOGGER_MAIN,
 	WAIT_EVENT_WAL_RECEIVER_MAIN,
 	WAIT_EVENT_WAL_SENDER_MAIN,
-	WAIT_EVENT_WAL_WRITER_MAIN
+	WAIT_EVENT_WAL_WRITER_MAIN,
+	WAIT_EVENT_WAL_WRITER_HIBERNATE
 } WaitEventActivity;
 
 /* ----------
#63Thomas Munro
thomas.munro@gmail.com
In reply to: Simon Riggs (#62)
Re: Reducing power consumption on idle servers

On Wed, Nov 30, 2022 at 1:32 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

Re-attaching patch for bgwriter and walwriter, so it is clear this is
not yet committed.

I'm just curious, and not suggesting that 60s wakeups are a problem
for the polar ice caps, but why even time out at all? Are the latch
protocols involved not reliable enough? At a guess from a quick
glance, the walwriter's is but maybe the bgwriter could miss a wakeup
as it races against StrategyGetBuffer(), which means you might stay
asleep until the *next* buffer allocation, but that's already true I
think, and a 60s timeout is not much of a defence.

#64Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Thomas Munro (#63)
Re: Reducing power consumption on idle servers

On Wed, 30 Nov 2022 at 03:50, Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Nov 30, 2022 at 1:32 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

Re-attaching patch for bgwriter and walwriter, so it is clear this is
not yet committed.

I'm just curious, and not suggesting that 60s wakeups are a problem
for the polar ice caps, but why even time out at all? Are the latch
protocols involved not reliable enough? At a guess from a quick
glance, the walwriter's is but maybe the bgwriter could miss a wakeup
as it races against StrategyGetBuffer(), which means you might stay
asleep until the *next* buffer allocation, but that's already true I
think, and a 60s timeout is not much of a defence.

That sounds reasonable.

It does sound like we agree that the existing behavior of waking up
every 5s or 2.5s is not good. I hope you will act to improve that.

The approach taken in this patch, and others of mine, has been to
offer a minimal change that achieves the objective of lengthy
hibernation to save power.

Removing the timeout entirely may not work in other circumstances I
have not explored. Doing that requires someone to check it actually
works, and for others to believe that check has occurred. For me, that
is too time consuming to actually happen in this dev cycle, and time
is part of the objective since perfect designs yet with unreleased
code have no utility.

<Simon enters lengthy hibernation>

--
Simon Riggs http://www.EnterpriseDB.com/

#65Thomas Munro
thomas.munro@gmail.com
In reply to: Simon Riggs (#64)
Re: Reducing power consumption on idle servers

On Wed, Nov 30, 2022 at 7:40 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

On Wed, 30 Nov 2022 at 03:50, Thomas Munro <thomas.munro@gmail.com> wrote:

I'm just curious, and not suggesting that 60s wakeups are a problem
for the polar ice caps, but why even time out at all? Are the latch
protocols involved not reliable enough? At a guess from a quick
glance, the walwriter's is but maybe the bgwriter could miss a wakeup
as it races against StrategyGetBuffer(), which means you might stay
asleep until the *next* buffer allocation, but that's already true I
think, and a 60s timeout is not much of a defence.

That sounds reasonable.

It does sound like we agree that the existing behavior of waking up
every 5s or 2.5s is not good. I hope you will act to improve that.

The approach taken in this patch, and others of mine, has been to
offer a minimal change that achieves the objective of lengthy
hibernation to save power.

Removing the timeout entirely may not work in other circumstances I
have not explored. Doing that requires someone to check it actually
works, and for others to believe that check has occurred. For me, that
is too time consuming to actually happen in this dev cycle, and time
is part of the objective since perfect designs yet with unreleased
code have no utility.

<Simon enters lengthy hibernation>

<Thomas returns from aestivation>

Yeah, I definitely want to fix it. I just worry that 60s is so long
that it also needs that analysis work to be done to explain that it's
OK that we're a bit sloppy on noticing when to wake up, at which point
you might as well go to infinity.

#66Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#65)
Re: Reducing power consumption on idle servers

Thomas Munro <thomas.munro@gmail.com> writes:

Yeah, I definitely want to fix it. I just worry that 60s is so long
that it also needs that analysis work to be done to explain that it's
OK that we're a bit sloppy on noticing when to wake up, at which point
you might as well go to infinity.

Yeah. The perfectionist in me wants to say that there should be
explicit wakeups for every event of interest, in which case there's
no need for a timeout. The engineer in me says "but what about bugs?".
Better a slow reaction than never reacting at all. OTOH, then you
have to have a discussion about whether 60s (or any other
ice-cap-friendly value) is an acceptable response time even in the
presence of bugs.

It's kind of moot until we've reached the point where we can
credibly claim to have explicit wakeups for every event of
interest. I don't think we're very close to that today, and
I do think we should try to get closer. There may come a point
of diminishing returns though.

regards, tom lane

#67Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tom Lane (#66)
Re: Reducing power consumption on idle servers

On Wed, Jan 25, 2023 at 2:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

Yeah, I definitely want to fix it. I just worry that 60s is so long
that it also needs that analysis work to be done to explain that it's
OK that we're a bit sloppy on noticing when to wake up, at which point
you might as well go to infinity.

Yeah. The perfectionist in me wants to say that there should be
explicit wakeups for every event of interest, in which case there's
no need for a timeout. The engineer in me says "but what about bugs?".
Better a slow reaction than never reacting at all. OTOH, then you
have to have a discussion about whether 60s (or any other
ice-cap-friendly value) is an acceptable response time even in the
presence of bugs.

It's kind of moot until we've reached the point where we can
credibly claim to have explicit wakeups for every event of
interest. I don't think we're very close to that today, and
I do think we should try to get closer. There may come a point
of diminishing returns though.

IIUC, we're discussing here whether or not to get rid of hibernate
loops, IOW, sleep-wakeup-doworkifthereisany-sleep loops and rely on
other processes' wakeup signals to reduce the overall power
consumption, am I right?

I'm trying to understand this a bit - can the signals (especially,
SIGURG that we use to set latches to wake up processes) ever get lost
on the way before reaching the target process? If yes, how? How
frequently can it happen? Is there any history of reported issues in
postgres because a signal got lost?

I'm reading about Pending Signals and queuing of signals with
sigqueue() (in linux), can any of these guarantee that signals sent
never get lost?

FWIW, a recent commit cd4329d that removed promote_trigger_file also
removed hibernation and added wait forever relying on the latch set.
If we're worried that the signals can get lost on the way, then this
also needs to be fixed. And, I also see lot of
WaitLatch() with waiting forever relying on others to wake them up.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#68Thomas Munro
thomas.munro@gmail.com
In reply to: Bharath Rupireddy (#67)
Re: Reducing power consumption on idle servers

On Fri, Jan 27, 2023 at 7:37 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Jan 25, 2023 at 2:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's kind of moot until we've reached the point where we can
credibly claim to have explicit wakeups for every event of
interest. I don't think we're very close to that today, and
I do think we should try to get closer. There may come a point
of diminishing returns though.

IIUC, we're discussing here whether or not to get rid of hibernate
loops, IOW, sleep-wakeup-doworkifthereisany-sleep loops and rely on
other processes' wakeup signals to reduce the overall power
consumption, am I right?

I'm trying to understand this a bit - can the signals (especially,
SIGURG that we use to set latches to wake up processes) ever get lost
on the way before reaching the target process? If yes, how? How
frequently can it happen? Is there any history of reported issues in
postgres because a signal got lost?

I'm reading about Pending Signals and queuing of signals with
sigqueue() (in linux), can any of these guarantee that signals sent
never get lost?

Signals don't get lost. At least with the current definition of so
called "reliable" signals, in the POSIX standard. (There was a
previous system of signals in ancient UNIX that was harder to program
with; we still see echos of that today, for example in C standard
there are basic signals, and Windows implements those, but they're
borderline useless; before a signal handler runs, the handler is also
de-registered, so many interesting handlers would have to re-register
it, but signals can't be atomically blocked at the same time and there
is a window of time where the default behaviour applies, which (from
our modern perspective) is clearly insane as there is literally no way
to close that race and avoid some fatal default behaviour). BTW
sigqueue (signals that are queued up without being consolidated, and
carry a user-controlled value with them) are "realtime signals" AKA
"signal queuing", which I guess we'll never be able to use because at
least Apple didn't implement them; the "reliable signals" we use are
the plain kind where signals don't carry any kind of payload and are
collapsed into a single "pending" bit, so that if 2 people do
kill(SIGFOO) around the same time your handler might only run once (if
you registered one and its not blocked), but it will definitely run >
0 times (or be delievered via various other means such as sigwait(),
etc...).

Then there is the question of races as you go into the wait primitive.
I mean in between our reading latch->is_set and entering the kernel,
which we defend against using various techniques which you can read
about at the top of latch.c; the short version is that if that
happens, those system calls will immediately return.

The uncertainty discussed here comes from the comment beginning "There
is a race condition here, ..." in bgwriter.c, referring to the
protocol implemented by StrategyNotifyBgWriter(). I haven't really
looked into it. I *guess* it's probably approximately OK if the
bgwriter misses a wakeup from time to time, because there's another
chance to wake it up as soon as someone needs another buffer, and if
you don't need any more buffers soon your system is probably idle; but
really "needing another buffer" isn't a great proxy for "more buffers
are being made dirty" at all! Someone would need to think about it
and construct a complete argument for why it's OK to go to sleep for
60 seconds, or infinity, with that sloppiness in place.

There's also the walwriter to look into; from memory it was a little
less fuzzy but I haven't looked recently.

#69Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Thomas Munro (#68)
1 attachment(s)
Re: Reducing power consumption on idle servers

On Fri, Jan 27, 2023 at 12:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:

There's also the walwriter to look into; from memory it was a little
less fuzzy but I haven't looked recently.

Thanks. I tried to do away with the walwriter hibernation for just
some time and made it wait indefinitely until an event occurs. Once
walwriter finds no work for some time, it goes to wait forever mode
and it's woken up when someone inserts WAL. Please see the attached
patch for more details. Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Reduce-power-consumption-by-WAL-writer.patchapplication/octet-stream; name=v1-0001-Reduce-power-consumption-by-WAL-writer.patchDownload
From da223287c12d0077951a00f2be7eff9cdc6ae880 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 31 Jan 2023 08:23:51 +0000
Subject: [PATCH v1] Reduce power consumption by WAL writer

---
 src/backend/access/transam/xlog.c       | 55 +++++++++++++++++++++++++
 src/backend/access/transam/xloginsert.c |  6 +++
 src/backend/postmaster/walwriter.c      | 20 +++++----
 src/include/access/xlog.h               |  1 +
 4 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fb4c860bde..918f3959d3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2393,6 +2393,61 @@ XLogSetAsyncXactLSN(XLogRecPtr asyncXactLSN)
 		SetLatch(ProcGlobal->walwriterLatch);
 }
 
+/*
+ * Wake up WAL writer process to write and flush newly generated WAL.
+ *
+ * This function better be in walwriter.c, but it reads LogwrtResult, a static
+ * variable from xlog.c. Hence we choose to keep it here for the sake of
+ * simplicity.
+ */
+void
+WakeupWALWriter(void)
+{
+	XLogRecPtr WriteRqstPtr = XactLastRecEnd;
+
+	/* back off to last completed page boundary */
+	WriteRqstPtr -= WriteRqstPtr % XLOG_BLCKSZ;
+
+	/*
+	 * Let's not nudge WAL writer if somebody (this backend itself in
+	 * XLogInsertRecord() or other backend or WAL writer itself in the ongoing
+	 * run) already flushed that far. This avoids unnecessary WAL writer
+	 * wakeups.
+	 *
+	 * It's possible that LogwrtResult is older, in which case, we nudge the
+	 * WAL writer and it may do nothing. This is okay than to make this
+	 * function costly by acquiring info_lck and read LogwrtResult freshly.
+	 */
+	if (WriteRqstPtr <= LogwrtResult.Flush)
+		return;
+
+	/*
+	 * XXXX: While more number of nudges/SetLatch() calls reduce the amount of
+	 * WAL that this backend needs to write/flsuh, it might burden the WAL
+	 * writer too much.
+	 *
+	 * To reduce the number of nudges, we can do all or some of the following
+	 * things:
+	 *
+	 * 1) Nudge only when WAL writer is sleeping (use WalWriterSleeping).
+	 *
+	 * 2) Nudge only after wal_writer_delay has elapsed since the last nudge.
+	 *
+	 * 3) Nudge only after this backend has generated a configurable amount of
+	 *    WAL. Introduce a new GUC for this purpose wal_writer_wakeup_after
+	 *    (Amount of WAL written out by a backend that wakes up WAL writer to
+	 *    help write and flush WAL). Or use existing GUC
+	 *    wal_writer_flush_after.
+	 */
+
+	/*
+	 * Nudge WAL writer. Even if WAL writer is woken up, it does its work only
+	 * when necessary at its own pace, see XLogBackgroundFlush() for details.
+	 */
+	if (ProcGlobal->walwriterLatch)
+		SetLatch(ProcGlobal->walwriterLatch);
+}
+
 /*
  * Record the LSN up to which we can remove WAL because it's not required by
  * any replication slot.
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 008612e032..3d7dfec3ba 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -503,6 +503,12 @@ XLogInsert(RmgrId rmid, uint8 info)
 
 	XLogResetInsertion();
 
+	/*
+	 * XXX: Effects of calling WakeupWALWriter() in XLogInsert() needs to be
+	 * measured, say with high-write workload.
+	 */
+	WakeupWALWriter();
+
 	return EndPos;
 }
 
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 3113e8fbdd..9e30679cdc 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -71,12 +71,10 @@ int			WalWriterDelay = 200;
 int			WalWriterFlushAfter = 128;
 
 /*
- * Number of do-nothing loops before lengthening the delay time, and the
- * multiplier to apply to WalWriterDelay when we do decide to hibernate.
+ * Number of do-nothing loops before lengthening the delay time.
  * (Perhaps these need to be configurable?)
  */
 #define LOOPS_UNTIL_HIBERNATE		50
-#define HIBERNATE_FACTOR			25
 
 /* Prototypes for private functions */
 static void HandleWalWriterInterrupts(void);
@@ -226,6 +224,7 @@ WalWriterMain(void)
 	for (;;)
 	{
 		long		cur_timeout;
+		int			wakeEvents;
 
 		/*
 		 * Advertise whether we might hibernate in this cycle.  We do this
@@ -262,16 +261,23 @@ WalWriterMain(void)
 
 		/*
 		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
-		 * haven't done anything useful for quite some time, lengthen the
-		 * sleep time so as to reduce the server's idle power consumption.
+		 * haven't done anything useful for quite some time, sleep until an
+		 * event occurs so as to reduce the server's idle power consumption.
 		 */
 		if (left_till_hibernate > 0)
+		{
 			cur_timeout = WalWriterDelay;	/* in ms */
+			wakeEvents = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+		}
 		else
-			cur_timeout = WalWriterDelay * HIBERNATE_FACTOR;
+		{
+			Assert(hibernating == true);
+			cur_timeout = -1L;
+			wakeEvents = WL_LATCH_SET | WL_EXIT_ON_PM_DEATH;
+		}
 
 		(void) WaitLatch(MyLatch,
-						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 wakeEvents,
 						 cur_timeout,
 						 WAIT_EVENT_WAL_WRITER_MAIN);
 	}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index cfe5409738..f30decfb7a 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -207,6 +207,7 @@ extern int	XLogFileOpen(XLogSegNo segno, TimeLineID tli);
 extern void CheckXLogRemoved(XLogSegNo segno, TimeLineID tli);
 extern XLogSegNo XLogGetLastRemovedSegno(void);
 extern void XLogSetAsyncXactLSN(XLogRecPtr asyncXactLSN);
+extern void WakeupWALWriter(void);
 extern void XLogSetReplicationSlotMinimumLSN(XLogRecPtr lsn);
 
 extern void xlog_redo(struct XLogReaderState *record);
-- 
2.34.1