BGWriter latch, power saving

Started by Peter Geogheganabout 14 years ago10 messages
#1Peter Geoghegan
peter@2ndquadrant.com
1 attachment(s)

As part of the ongoing effort to reduce wake-ups when idle/power
consumption, the attached patch modifies the background writer to
hibernate in ten second bursts once the bgwriter laps the clock sweep.
It's fairly well commented, so a description of how it works here
would probably be redundant. The most controversial aspect of this
patch is the fact that it adds a SetLatch() call to MarkBufferDirty(),
though I went to some lengths to ensure that that call will very
probably find the latch already set when it actually matters, so it
only checks a single flag.

Thoughts? I can produce benchmarks, if that helps, but I think it's
fairly unlikely that the patch introduces a measurable performance
regression.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

bgwriter_latch.v1.patchtext/x-patch; charset=US-ASCII; name=bgwriter_latch.v1.patchDownload
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
new file mode 100644
index e3ae92d..0fcaf01
*** a/src/backend/bootstrap/bootstrap.c
--- b/src/backend/bootstrap/bootstrap.c
*************** AuxiliaryProcessMain(int argc, char *arg
*** 416,421 ****
--- 416,422 ----
  
  		case BgWriterProcess:
  			/* don't set signals, bgwriter has its own agenda */
+ 			ProcGlobal->bgwriterLatch = &MyProc->procLatch; /* Expose */
  			BackgroundWriterMain();
  			proc_exit(1);		/* should never return */
  
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
new file mode 100644
index fc1a579..a5248ba
*** a/src/backend/port/unix_latch.c
--- b/src/backend/port/unix_latch.c
***************
*** 50,55 ****
--- 50,56 ----
  #include "miscadmin.h"
  #include "postmaster/postmaster.h"
  #include "storage/latch.h"
+ #include "storage/proc.h"
  #include "storage/shmem.h"
  
  /* Are we currently in WaitLatch? The signal handler would like to know. */
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
new file mode 100644
index 1f8d2d6..ad8a056
*** a/src/backend/postmaster/bgwriter.c
--- b/src/backend/postmaster/bgwriter.c
***************
*** 51,56 ****
--- 51,58 ----
  #include "storage/ipc.h"
  #include "storage/lwlock.h"
  #include "storage/pmsignal.h"
+ #include "storage/proc.h"
+ #include "storage/procsignal.h"
  #include "storage/shmem.h"
  #include "storage/smgr.h"
  #include "storage/spin.h"
*************** static volatile sig_atomic_t shutdown_re
*** 73,78 ****
--- 75,82 ----
  /*
   * Private state
   */
+ #define BGWRITER_HIBERNATE_MS			10000
+ 
  static bool am_bg_writer = false;
  
  /* Prototypes for private functions */
*************** BackgroundWriterMain(void)
*** 123,129 ****
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, SIG_IGN);			/* reserve for ProcSignal */
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
--- 127,133 ----
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
*************** BackgroundWriterMain(void)
*** 238,278 ****
  	for (;;)
  	{
  		/*
! 		 * Emergency bailout if postmaster has died.  This is to avoid the
! 		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
! 			exit(1);
! 
! 		if (got_SIGHUP)
  		{
! 			got_SIGHUP = false;
! 			ProcessConfigFile(PGC_SIGHUP);
! 			/* update global shmem state for sync rep */
  		}
! 		if (shutdown_requested)
  		{
  			/*
! 			 * From here on, elog(ERROR) should end with exit(1), not send
! 			 * control back to the sigsetjmp block above
  			 */
! 			ExitOnAnyError = true;
! 			/* Normal exit from the bgwriter is here */
! 			proc_exit(0);		/* done */
! 		}
  
! 		/*
! 		 * Do one cycle of dirty-buffer writing.
! 		 */
! 		BgBufferSync();
  
! 		/* Nap for the configured time. */
! 		BgWriterNap();
  	}
  }
  
  /*
   * BgWriterNap -- Nap for the configured time or until a signal is received.
   */
  static void
  BgWriterNap(void)
--- 242,337 ----
  	for (;;)
  	{
  		/*
! 		 * Do one cycle of dirty-buffer writing, potentially hibernating if
! 		 * there have been no buffers to write.
  		 */
! 		if (!BgBufferSync())
  		{
! 			/* Clock sweep was not "lapped" - nap for the configured time. */
! 			BgWriterNap();
  		}
! 		else
  		{
+ 			int res = 0;
  			/*
! 			 * Initiate hibernation by "arming" the process latch. This usage
! 			 * differs from the standard pattern for latches outlined in
! 			 * latch.h, where the latch is generally reset immediately after
! 			 * WaitLatch returns, in the next iteration of an infinite loop.
! 			 *
! 			 * It should only be possible to *really* set the latch from client
! 			 * backends while the bgwriter is idle, and not during productive
! 			 * cycles where buffers are written, or shortly thereafter. It's
! 			 * important that the SetLatch() call within the buffer manager
! 			 * usually inexpensively returns having found that the latch is
! 			 * already set. For write-heavy workloads, it's quite possible that
! 			 * those calls will never actually get the opportunity to really
! 			 * set the latch.
! 			 *
! 			 * By only giving backends the opportunity to really set the
! 			 * latch at this point via ResetLatch(), we avoid the overhead of
! 			 * sending a signal perhaps as much as once per bgwriter cycle, or
! 			 * maybe even more frequently than that due to the asynchronous
! 			 * nature of signals.
  			 */
! 			ResetLatch(&MyProc->procLatch);
  
! 			/*
! 			 * Another nap and sync, in case some buffers were dirtied since
! 			 * the immediately prior BgBufferSync() call and we quite
! 			 * naturally didn't hear about it because the SetLatch calls
! 			 * found the latch already set (i.e. it was "disarmed").
! 			 */
! 			BgWriterNap();
  
! 			/*
! 			 * When syncing, take the opportunity to check one last time if
! 			 * we're still "lapping" the clock sweep, and only proceed if we are.
! 			 */
! 			if (BgBufferSync())
! 				/*
! 				 * Set a timeout so that we can still update BufferAlloc stats
! 				 * reasonably promptly.
! 				 */
! 				res = WaitLatch(&MyProc->procLatch,
! 						WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 						BGWRITER_HIBERNATE_MS);
! 			else
! 				/*
! 				 * Changed our mind, set own proc latch, thereby often avoiding
! 				 * signaling from a backend.
! 				 */
! 				SetLatch(&MyProc->procLatch);
! 
! 			/*
! 			 * Ensure that no less than BgWriterDelay ms has passed between
! 			 * BgBufferSyncs - WaitLatch() might have returned instantaneously.
! 			 *
! 			 * Napping here rather than immediately prior to the WaitLatch()
! 			 * call ensures that a sudden burst of dirtied buffers won't result
! 			 * in the initial background writer cycle ineffectually cleaning
! 			 * only a few pages dirtied earlier in the burst, rather than
! 			 * cleaning a number that is somewhere closer to optimal for the
! 			 * cycle, as modeled by the background writing strategy. We'll also
! 			 * want to nap if we changed our minds, before starting a new cycle.
! 			 *
! 			 * We wake on a buffer being dirtied. It's possible that some
! 			 * useful work will become available for the bgwriter to do without
! 			 * a buffer actually being dirtied, as when a dirty buffer's usage
! 			 * count is decremented to zero or it's unpinned. This corner case
! 			 * is judged as too marginal to justify adding additional SetLatch()
! 			 * calls in very hot code paths, cheap though those calls may be.
! 			 */
! 			if (!(res & WL_TIMEOUT))
! 				BgWriterNap();
! 		}
  	}
  }
  
  /*
   * BgWriterNap -- Nap for the configured time or until a signal is received.
+  *
+  * Respond to a signal handler having set a flag for us.
   */
  static void
  BgWriterNap(void)
*************** BgWriterNap(void)
*** 285,290 ****
--- 344,373 ----
  	pgstat_send_bgwriter();
  
  	/*
+ 	 * Emergency bailout if postmaster has died.  This is to avoid the
+ 	 * necessity for manual cleanup of all postmaster children.
+ 	 */
+ 	if (!PostmasterIsAlive())
+ 		exit(1);
+ 
+ 	if (got_SIGHUP)
+ 	{
+ 		got_SIGHUP = false;
+ 		ProcessConfigFile(PGC_SIGHUP);
+ 		/* update global shmem state for sync rep */
+ 	}
+ 	if (shutdown_requested)
+ 	{
+ 		/*
+ 		 * From here on, elog(ERROR) should end with exit(1), not send
+ 		 * control back to the sigsetjmp block above
+ 		 */
+ 		ExitOnAnyError = true;
+ 		/* Normal exit from the bgwriter is here */
+ 		proc_exit(0);		/* done */
+ 	}
+ 
+ 	/*
  	 * Nap for the configured time, or sleep for 10 seconds if there is no
  	 * bgwriter activity configured.
  	 *
*************** BgWriterNap(void)
*** 293,302 ****
  	 * sleep into 1-second increments, and check for interrupts after each
  	 * nap.
  	 */
! 	if (bgwriter_lru_maxpages > 0)
! 		udelay = BgWriterDelay * 1000L;
! 	else
! 		udelay = 10000000L;		/* Ten seconds */
  
  	while (udelay > 999999L)
  	{
--- 376,382 ----
  	 * sleep into 1-second increments, and check for interrupts after each
  	 * nap.
  	 */
! 	udelay = BgWriterDelay * 1000L;
  
  	while (udelay > 999999L)
  	{
*************** bg_quickdie(SIGNAL_ARGS)
*** 351,362 ****
--- 431,454 ----
  static void
  BgSigHupHandler(SIGNAL_ARGS)
  {
+ 	int save_errno = errno;
+ 
  	got_SIGHUP = true;
+ 	if (MyProc)
+ 		SetLatch(&MyProc->procLatch);
+ 
+ 	errno = save_errno;
  }
  
  /* SIGTERM: set flag to shutdown and exit */
  static void
  ReqShutdownHandler(SIGNAL_ARGS)
  {
+ 	int save_errno = errno;
+ 
  	shutdown_requested = true;
+ 	if (MyProc)
+ 		SetLatch(&MyProc->procLatch);
+ 
+ 	errno = save_errno;
  }
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
new file mode 100644
index 91cc001..f38f76a
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*************** MarkBufferDirty(Buffer buffer)
*** 968,973 ****
--- 968,976 ----
  	Assert(PrivateRefCount[buffer - 1] > 0);
  	/* unfortunately we can't check if the lock is held exclusively */
  	Assert(LWLockHeldByMe(bufHdr->content_lock));
+ 	/* The bgwriter may need to be woken. */
+ 	if (ProcGlobal->bgwriterLatch)
+ 		SetLatch(ProcGlobal->bgwriterLatch);
  
  	LockBufHdr(bufHdr);
  
*************** BufferSync(int flags)
*** 1307,1314 ****
   * BgBufferSync -- Write out some dirty buffers in the pool.
   *
   * This is called periodically by the background writer process.
   */
! void
  BgBufferSync(void)
  {
  	/* info obtained from freelist.c */
--- 1310,1321 ----
   * BgBufferSync -- Write out some dirty buffers in the pool.
   *
   * This is called periodically by the background writer process.
+  *
+  * Returns a value indicating if the clocksweep has been "lapped", or if the
+  * bgwriter has been effectively disabled due to finding bgwriter_lru_maxpages
+  * at 0.
   */
! bool
  BgBufferSync(void)
  {
  	/* info obtained from freelist.c */
*************** BgBufferSync(void)
*** 1365,1371 ****
  	if (bgwriter_lru_maxpages <= 0)
  	{
  		saved_info_valid = false;
! 		return;
  	}
  
  	/*
--- 1372,1378 ----
  	if (bgwriter_lru_maxpages <= 0)
  	{
  		saved_info_valid = false;
! 		return true;
  	}
  
  	/*
*************** BgBufferSync(void)
*** 1584,1589 ****
--- 1591,1598 ----
  			 recent_alloc, strategy_delta, scans_per_alloc, smoothed_density);
  #endif
  	}
+ 
+ 	return bufs_to_lap == 0;
  }
  
  /*
diff --git a/src/include/postmaster/bgwriter.h b/src/include/postmaster/bgwriter.h
new file mode 100644
index 6cc4b62..970faa2
*** a/src/include/postmaster/bgwriter.h
--- b/src/include/postmaster/bgwriter.h
***************
*** 13,18 ****
--- 13,19 ----
  #define _BGWRITER_H
  
  #include "storage/block.h"
+ #include "storage/latch.h"
  #include "storage/relfilenode.h"
  
  
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
new file mode 100644
index a03c068..de1bbd0
*** a/src/include/storage/bufmgr.h
--- b/src/include/storage/bufmgr.h
*************** extern bool HoldingBufferPinThatDelaysRe
*** 213,219 ****
  extern void AbortBufferIO(void);
  
  extern void BufmgrCommit(void);
! extern void BgBufferSync(void);
  
  extern void AtProcExit_LocalBuffers(void);
  
--- 213,219 ----
  extern void AbortBufferIO(void);
  
  extern void BufmgrCommit(void);
! extern bool BgBufferSync(void);
  
  extern void AtProcExit_LocalBuffers(void);
  
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
new file mode 100644
index 358d1a4..4520f27
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
*************** typedef struct PROC_HDR
*** 188,193 ****
--- 188,195 ----
  	PGPROC	   *freeProcs;
  	/* Head of list of autovacuum's free PGPROC structures */
  	PGPROC	   *autovacFreeProcs;
+ 	/* BGWriter process latch */
+ 	Latch *bgwriterLatch;
  	/* Current shared estimate of appropriate spins_per_delay value */
  	int			spins_per_delay;
  	/* The proc of the Startup process, since not in ProcArray */
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#1)
Re: BGWriter latch, power saving

On 04.01.2012 07:58, Peter Geoghegan wrote:

As part of the ongoing effort to reduce wake-ups when idle/power
consumption, the attached patch modifies the background writer to
hibernate in ten second bursts once the bgwriter laps the clock sweep.
It's fairly well commented, so a description of how it works here
would probably be redundant. The most controversial aspect of this
patch is the fact that it adds a SetLatch() call to MarkBufferDirty(),
though I went to some lengths to ensure that that call will very
probably find the latch already set when it actually matters, so it
only checks a single flag.

I think SetBufferCommitInfoNeedsSave() needs the same treatment as
MarkBufferDirty(). And it would probably be good to only set the latch
if the buffer wasn't dirty already. Setting a latch that's already set
is fast, but surely it's even faster to not even try.

Thoughts? I can produce benchmarks, if that helps, but I think it's
fairly unlikely that the patch introduces a measurable performance
regression.

Yeah, I'd like to see a micro-benchmark of a worst-case scenario. I'm a
bit worried about the impact on systems with a lot of CPUs. If you have
a lot of CPUs writing to the same cache line that contains the latch's
flag, that might get expensive.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#2)
Re: BGWriter latch, power saving

On Wed, Jan 4, 2012 at 7:24 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Setting a latch that's already set is fast,
but surely it's even faster to not even try.

Agreed. I think we should SetLatch() at the first point a backend
writes a dirty buffer because the bgwriter has been inactive.

Continually waking the bgwriter makes it harder for it to return to sleep.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#4Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#2)
1 attachment(s)
Re: BGWriter latch, power saving

On 4 January 2012 07:24, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I think SetBufferCommitInfoNeedsSave() needs the same treatment as
MarkBufferDirty(). And it would probably be good to only set the latch if
the buffer wasn't dirty already. Setting a latch that's already set is fast,
but surely it's even faster to not even try.

That seems reasonable. Revised patch is attached.

Yeah, I'd like to see a micro-benchmark of a worst-case scenario. I'm a bit
worried about the impact on systems with a lot of CPUs. If you have a lot of
CPUs writing to the same cache line that contains the latch's flag, that
might get expensive.

Also reasonable, but I don't think that I'll get around to it until
after the final commitfest deadline.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

bgwriter_latch.v2.patchtext/x-patch; charset=US-ASCII; name=bgwriter_latch.v2.patchDownload
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
new file mode 100644
index e3ae92d..0fcaf01
*** a/src/backend/bootstrap/bootstrap.c
--- b/src/backend/bootstrap/bootstrap.c
*************** AuxiliaryProcessMain(int argc, char *arg
*** 416,421 ****
--- 416,422 ----
  
  		case BgWriterProcess:
  			/* don't set signals, bgwriter has its own agenda */
+ 			ProcGlobal->bgwriterLatch = &MyProc->procLatch; /* Expose */
  			BackgroundWriterMain();
  			proc_exit(1);		/* should never return */
  
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
new file mode 100644
index fc1a579..a5248ba
*** a/src/backend/port/unix_latch.c
--- b/src/backend/port/unix_latch.c
***************
*** 50,55 ****
--- 50,56 ----
  #include "miscadmin.h"
  #include "postmaster/postmaster.h"
  #include "storage/latch.h"
+ #include "storage/proc.h"
  #include "storage/shmem.h"
  
  /* Are we currently in WaitLatch? The signal handler would like to know. */
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
new file mode 100644
index 1f8d2d6..59e5180
*** a/src/backend/postmaster/bgwriter.c
--- b/src/backend/postmaster/bgwriter.c
***************
*** 51,56 ****
--- 51,58 ----
  #include "storage/ipc.h"
  #include "storage/lwlock.h"
  #include "storage/pmsignal.h"
+ #include "storage/proc.h"
+ #include "storage/procsignal.h"
  #include "storage/shmem.h"
  #include "storage/smgr.h"
  #include "storage/spin.h"
*************** static volatile sig_atomic_t shutdown_re
*** 73,78 ****
--- 75,82 ----
  /*
   * Private state
   */
+ #define BGWRITER_HIBERNATE_MS			10000
+ 
  static bool am_bg_writer = false;
  
  /* Prototypes for private functions */
*************** BackgroundWriterMain(void)
*** 123,129 ****
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, SIG_IGN);			/* reserve for ProcSignal */
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
--- 127,133 ----
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
*************** BackgroundWriterMain(void)
*** 238,278 ****
  	for (;;)
  	{
  		/*
! 		 * Emergency bailout if postmaster has died.  This is to avoid the
! 		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
! 			exit(1);
! 
! 		if (got_SIGHUP)
  		{
! 			got_SIGHUP = false;
! 			ProcessConfigFile(PGC_SIGHUP);
! 			/* update global shmem state for sync rep */
  		}
! 		if (shutdown_requested)
  		{
  			/*
! 			 * From here on, elog(ERROR) should end with exit(1), not send
! 			 * control back to the sigsetjmp block above
  			 */
! 			ExitOnAnyError = true;
! 			/* Normal exit from the bgwriter is here */
! 			proc_exit(0);		/* done */
! 		}
  
! 		/*
! 		 * Do one cycle of dirty-buffer writing.
! 		 */
! 		BgBufferSync();
  
! 		/* Nap for the configured time. */
! 		BgWriterNap();
  	}
  }
  
  /*
   * BgWriterNap -- Nap for the configured time or until a signal is received.
   */
  static void
  BgWriterNap(void)
--- 242,337 ----
  	for (;;)
  	{
  		/*
! 		 * Do one cycle of dirty-buffer writing, potentially hibernating if
! 		 * there have been no buffers to write.
  		 */
! 		if (!BgBufferSync())
  		{
! 			/* Clock sweep was not "lapped" - nap for the configured time. */
! 			BgWriterNap();
  		}
! 		else
  		{
+ 			int res = 0;
  			/*
! 			 * Initiate hibernation by "arming" the process latch. This usage
! 			 * differs from the standard pattern for latches outlined in
! 			 * latch.h, where the latch is generally reset immediately after
! 			 * WaitLatch returns, in the next iteration of an infinite loop.
! 			 *
! 			 * It should only be possible to *really* set the latch from client
! 			 * backends while the bgwriter is idle, and not during productive
! 			 * cycles where buffers are written, or shortly thereafter. It's
! 			 * important that the SetLatch() call within the buffer manager
! 			 * usually inexpensively returns having found that the latch is
! 			 * already set. For write-heavy workloads, it's quite possible that
! 			 * those calls will never actually get the opportunity to really
! 			 * set the latch.
! 			 *
! 			 * By only giving backends the opportunity to really set the
! 			 * latch at this point via ResetLatch(), we avoid the overhead of
! 			 * sending a signal perhaps as much as once per bgwriter cycle, or
! 			 * maybe even more frequently than that due to the asynchronous
! 			 * nature of signals.
  			 */
! 			ResetLatch(&MyProc->procLatch);
  
! 			/*
! 			 * Another nap and sync, in case some buffers were dirtied since
! 			 * the immediately prior BgBufferSync() call and we quite
! 			 * naturally didn't hear about it because the SetLatch calls
! 			 * found the latch already set (i.e. it was "disarmed").
! 			 */
! 			BgWriterNap();
  
! 			/*
! 			 * When syncing, take the opportunity to check one last time if
! 			 * we're still "lapping" the clock sweep, and only proceed if we are.
! 			 */
! 			if (BgBufferSync())
! 				/*
! 				 * Set a timeout so that we can still update BufferAlloc stats
! 				 * reasonably promptly.
! 				 */
! 				res = WaitLatch(&MyProc->procLatch,
! 						WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 						BGWRITER_HIBERNATE_MS);
! 			else
! 				/*
! 				 * Changed our mind, set own proc latch, thereby often avoiding
! 				 * signaling from a backend.
! 				 */
! 				SetLatch(&MyProc->procLatch);
! 
! 			/*
! 			 * Ensure that no less than BgWriterDelay ms has passed between
! 			 * BgBufferSyncs - WaitLatch() might have returned instantaneously.
! 			 *
! 			 * Napping here rather than immediately prior to the WaitLatch()
! 			 * call ensures that a sudden burst of dirtied buffers won't result
! 			 * in the initial background writer cycle ineffectually cleaning
! 			 * only a few pages dirtied earlier in the burst, rather than
! 			 * cleaning a number that is somewhere closer to optimal for the
! 			 * cycle, as modeled by the background writing strategy. We'll also
! 			 * want to nap if we changed our minds, before starting a new cycle.
! 			 *
! 			 * We wake on a buffer being dirtied. It's possible that some
! 			 * useful work will become available for the bgwriter to do without
! 			 * a buffer actually being dirtied, as when a dirty buffer's usage
! 			 * count is decremented to zero or it's unpinned. This corner case
! 			 * is judged as too marginal to justify adding additional SetLatch()
! 			 * calls in very hot code paths, cheap though those calls may be.
! 			 */
! 			if (!(res & WL_TIMEOUT))
! 				BgWriterNap();
! 		}
  	}
  }
  
  /*
   * BgWriterNap -- Nap for the configured time or until a signal is received.
+  *
+  * Respond to a signal handler having set a flag for us.
   */
  static void
  BgWriterNap(void)
*************** BgWriterNap(void)
*** 285,302 ****
  	pgstat_send_bgwriter();
  
  	/*
! 	 * Nap for the configured time, or sleep for 10 seconds if there is no
! 	 * bgwriter activity configured.
  	 *
  	 * On some platforms, signals won't interrupt the sleep.  To ensure we
  	 * respond reasonably promptly when someone signals us, break down the
  	 * sleep into 1-second increments, and check for interrupts after each
  	 * nap.
  	 */
! 	if (bgwriter_lru_maxpages > 0)
! 		udelay = BgWriterDelay * 1000L;
! 	else
! 		udelay = 10000000L;		/* Ten seconds */
  
  	while (udelay > 999999L)
  	{
--- 344,381 ----
  	pgstat_send_bgwriter();
  
  	/*
! 	 * Emergency bailout if postmaster has died.  This is to avoid the
! 	 * necessity for manual cleanup of all postmaster children.
! 	 */
! 	if (!PostmasterIsAlive())
! 		exit(1);
! 
! 	if (got_SIGHUP)
! 	{
! 		got_SIGHUP = false;
! 		ProcessConfigFile(PGC_SIGHUP);
! 		/* update global shmem state for sync rep */
! 	}
! 	if (shutdown_requested)
! 	{
! 		/*
! 		 * From here on, elog(ERROR) should end with exit(1), not send
! 		 * control back to the sigsetjmp block above
! 		 */
! 		ExitOnAnyError = true;
! 		/* Normal exit from the bgwriter is here */
! 		proc_exit(0);		/* done */
! 	}
! 
! 	/*
! 	 * Nap for the configured time.
  	 *
  	 * On some platforms, signals won't interrupt the sleep.  To ensure we
  	 * respond reasonably promptly when someone signals us, break down the
  	 * sleep into 1-second increments, and check for interrupts after each
  	 * nap.
  	 */
! 	udelay = BgWriterDelay * 1000L;
  
  	while (udelay > 999999L)
  	{
*************** bg_quickdie(SIGNAL_ARGS)
*** 351,362 ****
--- 430,453 ----
  static void
  BgSigHupHandler(SIGNAL_ARGS)
  {
+ 	int save_errno = errno;
+ 
  	got_SIGHUP = true;
+ 	if (MyProc)
+ 		SetLatch(&MyProc->procLatch);
+ 
+ 	errno = save_errno;
  }
  
  /* SIGTERM: set flag to shutdown and exit */
  static void
  ReqShutdownHandler(SIGNAL_ARGS)
  {
+ 	int save_errno = errno;
+ 
  	shutdown_requested = true;
+ 	if (MyProc)
+ 		SetLatch(&MyProc->procLatch);
+ 
+ 	errno = save_errno;
  }
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
new file mode 100644
index 91cc001..15ddf83
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*************** MarkBufferDirty(Buffer buffer)
*** 981,986 ****
--- 981,989 ----
  		VacuumPageDirty++;
  		if (VacuumCostActive)
  			VacuumCostBalance += VacuumCostPageDirty;
+ 		/* The bgwriter may need to be woken. */
+ 		if (ProcGlobal->bgwriterLatch)
+ 			SetLatch(ProcGlobal->bgwriterLatch);
  	}
  
  	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
*************** BufferSync(int flags)
*** 1307,1314 ****
   * BgBufferSync -- Write out some dirty buffers in the pool.
   *
   * This is called periodically by the background writer process.
   */
! void
  BgBufferSync(void)
  {
  	/* info obtained from freelist.c */
--- 1310,1321 ----
   * BgBufferSync -- Write out some dirty buffers in the pool.
   *
   * This is called periodically by the background writer process.
+  *
+  * Returns a value indicating if the clocksweep has been "lapped", or if the
+  * bgwriter has been effectively disabled due to finding bgwriter_lru_maxpages
+  * at 0.
   */
! bool
  BgBufferSync(void)
  {
  	/* info obtained from freelist.c */
*************** BgBufferSync(void)
*** 1365,1371 ****
  	if (bgwriter_lru_maxpages <= 0)
  	{
  		saved_info_valid = false;
! 		return;
  	}
  
  	/*
--- 1372,1378 ----
  	if (bgwriter_lru_maxpages <= 0)
  	{
  		saved_info_valid = false;
! 		return true;
  	}
  
  	/*
*************** BgBufferSync(void)
*** 1584,1589 ****
--- 1591,1598 ----
  			 recent_alloc, strategy_delta, scans_per_alloc, smoothed_density);
  #endif
  	}
+ 
+ 	return bufs_to_lap == 0;
  }
  
  /*
*************** SetBufferCommitInfoNeedsSave(Buffer buff
*** 2348,2353 ****
--- 2357,2365 ----
  			VacuumPageDirty++;
  			if (VacuumCostActive)
  				VacuumCostBalance += VacuumCostPageDirty;
+ 			/* The bgwriter may need to be woken. */
+ 			if (ProcGlobal->bgwriterLatch)
+ 				SetLatch(ProcGlobal->bgwriterLatch);
  		}
  		bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  		UnlockBufHdr(bufHdr);
diff --git a/src/include/postmaster/bgwriter.h b/src/include/postmaster/bgwriter.h
new file mode 100644
index 6cc4b62..970faa2
*** a/src/include/postmaster/bgwriter.h
--- b/src/include/postmaster/bgwriter.h
***************
*** 13,18 ****
--- 13,19 ----
  #define _BGWRITER_H
  
  #include "storage/block.h"
+ #include "storage/latch.h"
  #include "storage/relfilenode.h"
  
  
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
new file mode 100644
index a03c068..de1bbd0
*** a/src/include/storage/bufmgr.h
--- b/src/include/storage/bufmgr.h
*************** extern bool HoldingBufferPinThatDelaysRe
*** 213,219 ****
  extern void AbortBufferIO(void);
  
  extern void BufmgrCommit(void);
! extern void BgBufferSync(void);
  
  extern void AtProcExit_LocalBuffers(void);
  
--- 213,219 ----
  extern void AbortBufferIO(void);
  
  extern void BufmgrCommit(void);
! extern bool BgBufferSync(void);
  
  extern void AtProcExit_LocalBuffers(void);
  
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
new file mode 100644
index 358d1a4..4520f27
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
*************** typedef struct PROC_HDR
*** 188,193 ****
--- 188,195 ----
  	PGPROC	   *freeProcs;
  	/* Head of list of autovacuum's free PGPROC structures */
  	PGPROC	   *autovacFreeProcs;
+ 	/* BGWriter process latch */
+ 	Latch *bgwriterLatch;
  	/* Current shared estimate of appropriate spins_per_delay value */
  	int			spins_per_delay;
  	/* The proc of the Startup process, since not in ProcArray */
#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#4)
1 attachment(s)
Re: BGWriter latch, power saving

On 04.01.2012 17:05, Peter Geoghegan wrote:

On 4 January 2012 07:24, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I think SetBufferCommitInfoNeedsSave() needs the same treatment as
MarkBufferDirty(). And it would probably be good to only set the latch if
the buffer wasn't dirty already. Setting a latch that's already set is fast,
but surely it's even faster to not even try.

That seems reasonable. Revised patch is attached.

Thanks! It occurs to me that it's still a bad idea to call SetLatch()
while holding the buffer header spinlock. At least when it's trivial to
just move the call after releasing the lock. See attached.

Could you do the sleeping/hibernating logic all in BgWriterNap()?

Yeah, I'd like to see a micro-benchmark of a worst-case scenario. I'm a bit
worried about the impact on systems with a lot of CPUs. If you have a lot of
CPUs writing to the same cache line that contains the latch's flag, that
might get expensive.

Also reasonable, but I don't think that I'll get around to it until
after the final commitfest deadline.

That's still pending. And I admit I haven't tested my updated version
besides "make check".

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

bgwriter-latch-v3.patchtext/x-diff; name=bgwriter-latch-v3.patchDownload
*** a/src/backend/postmaster/bgwriter.c
--- b/src/backend/postmaster/bgwriter.c
***************
*** 51,56 ****
--- 51,58 ----
  #include "storage/ipc.h"
  #include "storage/lwlock.h"
  #include "storage/pmsignal.h"
+ #include "storage/proc.h"
+ #include "storage/procsignal.h"
  #include "storage/shmem.h"
  #include "storage/smgr.h"
  #include "storage/spin.h"
***************
*** 73,83 **** static volatile sig_atomic_t shutdown_requested = false;
  /*
   * Private state
   */
  static bool am_bg_writer = false;
  
  /* Prototypes for private functions */
  
! static void BgWriterNap(void);
  
  /* Signal handlers */
  
--- 75,87 ----
  /*
   * Private state
   */
+ #define BGWRITER_HIBERNATE_MS			10000
+ 
  static bool am_bg_writer = false;
  
  /* Prototypes for private functions */
  
! static void BgWriterNap(bool hibernating);
  
  /* Signal handlers */
  
***************
*** 97,102 **** BackgroundWriterMain(void)
--- 101,107 ----
  {
  	sigjmp_buf	local_sigjmp_buf;
  	MemoryContext bgwriter_context;
+ 	bool		hibernating;
  
  	am_bg_writer = true;
  
***************
*** 123,129 **** BackgroundWriterMain(void)
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, SIG_IGN);			/* reserve for ProcSignal */
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
--- 128,134 ----
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
***************
*** 139,144 **** BackgroundWriterMain(void)
--- 144,155 ----
  	sigdelset(&BlockSig, SIGQUIT);
  
  	/*
+ 	 * Advertise our latch that backends can use to wake us up while we're
+ 	 * sleeping.
+ 	 */
+ 	ProcGlobal->bgwriterLatch = &MyProc->procLatch;
+ 
+ 	/*
  	 * Create a resource owner to keep track of our resources (currently only
  	 * buffer pins).
  	 */
***************
*** 235,242 **** BackgroundWriterMain(void)
--- 246,256 ----
  	/*
  	 * Loop forever
  	 */
+ 	hibernating = false;
  	for (;;)
  	{
+ 		bool lapped;
+ 
  		/*
  		 * Emergency bailout if postmaster has died.  This is to avoid the
  		 * necessity for manual cleanup of all postmaster children.
***************
*** 264,281 **** BackgroundWriterMain(void)
  		/*
  		 * Do one cycle of dirty-buffer writing.
  		 */
! 		BgBufferSync();
  
! 		/* Nap for the configured time. */
! 		BgWriterNap();
  	}
  }
  
  /*
   * BgWriterNap -- Nap for the configured time or until a signal is received.
   */
  static void
! BgWriterNap(void)
  {
  	long		udelay;
  
--- 278,333 ----
  		/*
  		 * Do one cycle of dirty-buffer writing.
  		 */
! 		lapped = BgBufferSync();
! 
! 		if (lapped && !hibernating)
! 		{
! 			/*
! 			 * Initiate hibernation by "arming" the process latch. This usage
! 			 * differs from the standard pattern for latches outlined in
! 			 * latch.h, where the latch is generally reset immediately after
! 			 * WaitLatch returns, in the next iteration of an infinite loop.
! 			 *
! 			 * It should only be possible to *really* set the latch from client
! 			 * backends while the bgwriter is idle, and not during productive
! 			 * cycles where buffers are written, or shortly thereafter. It's
! 			 * important that the SetLatch() call within the buffer manager
! 			 * usually inexpensively returns having found that the latch is
! 			 * already set. For write-heavy workloads, it's quite possible that
! 			 * those calls will never actually get the opportunity to really
! 			 * set the latch.
! 			 *
! 			 * By only giving backends the opportunity to really set the
! 			 * latch at this point via ResetLatch(), we avoid the overhead of
! 			 * sending a signal perhaps as much as once per bgwriter cycle, or
! 			 * maybe even more frequently than that due to the asynchronous
! 			 * nature of signals.
! 			 */
! 			ResetLatch(&MyProc->procLatch);
! 			hibernating = true;
! 		}
! 		else if (!lapped && hibernating)
! 		{
! 			/*
! 			 * Woken up from hibernation. Set own proc latch just in case it's
! 			 * not set yet (usually we wake up from hibernation because a
! 			 * backend already set the latch).
! 			 */
! 			SetLatch(&MyProc->procLatch);
! 			hibernating = false;
! 		}
  
! 		BgWriterNap(hibernating);
  	}
  }
  
  /*
   * BgWriterNap -- Nap for the configured time or until a signal is received.
+  *
+  * Respond to a signal handler having set a flag for us.
   */
  static void
! BgWriterNap(bool hibernating)
  {
  	long		udelay;
  
***************
*** 284,302 **** BgWriterNap(void)
  	 */
  	pgstat_send_bgwriter();
  
  	/*
! 	 * Nap for the configured time, or sleep for 10 seconds if there is no
! 	 * bgwriter activity configured.
  	 *
  	 * On some platforms, signals won't interrupt the sleep.  To ensure we
  	 * respond reasonably promptly when someone signals us, break down the
  	 * sleep into 1-second increments, and check for interrupts after each
  	 * nap.
  	 */
! 	if (bgwriter_lru_maxpages > 0)
! 		udelay = BgWriterDelay * 1000L;
! 	else
! 		udelay = 10000000L;		/* Ten seconds */
  
  	while (udelay > 999999L)
  	{
--- 336,383 ----
  	 */
  	pgstat_send_bgwriter();
  
+ 	if (hibernating)
+ 	{
+ 		/*
+ 		 * Set a timeout so that we can still update BufferAlloc stats
+ 		 * reasonably promptly.
+ 		 */
+ 		int res = WaitLatch(&MyProc->procLatch,
+ 							WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ 							BGWRITER_HIBERNATE_MS);
+ 
+ 		/*
+ 		 * Ensure that no less than BgWriterDelay ms has passed between
+ 		 * BgBufferSyncs - WaitLatch() might have returned instantaneously.
+ 		 *
+ 		 * Napping here rather than immediately prior to the WaitLatch()
+ 		 * call ensures that a sudden burst of dirtied buffers won't result
+ 		 * in the initial background writer cycle ineffectually cleaning
+ 		 * only a few pages dirtied earlier in the burst, rather than
+ 		 * cleaning a number that is somewhere closer to optimal for the
+ 		 * cycle, as modeled by the background writing strategy. We'll also
+ 		 * want to nap if we changed our minds, before starting a new cycle.
+ 		 *
+ 		 * We wake on a buffer being dirtied. It's possible that some
+ 		 * useful work will become available for the bgwriter to do without
+ 		 * a buffer actually being dirtied, as when a dirty buffer's usage
+ 		 * count is decremented to zero or it's unpinned. This corner case
+ 		 * is judged as too marginal to justify adding additional SetLatch()
+ 		 * calls in very hot code paths, cheap though those calls may be.
+ 		 */
+ 		if (res & (WL_TIMEOUT | WL_POSTMASTER_DEATH))
+ 			return;
+ 	}
+ 
  	/*
! 	 * Nap for the configured time.
  	 *
  	 * On some platforms, signals won't interrupt the sleep.  To ensure we
  	 * respond reasonably promptly when someone signals us, break down the
  	 * sleep into 1-second increments, and check for interrupts after each
  	 * nap.
  	 */
! 	udelay = BgWriterDelay * 1000L;
  
  	while (udelay > 999999L)
  	{
***************
*** 351,362 **** bg_quickdie(SIGNAL_ARGS)
--- 432,455 ----
  static void
  BgSigHupHandler(SIGNAL_ARGS)
  {
+ 	int save_errno = errno;
+ 
  	got_SIGHUP = true;
+ 	if (MyProc)
+ 		SetLatch(&MyProc->procLatch);
+ 
+ 	errno = save_errno;
  }
  
  /* SIGTERM: set flag to shutdown and exit */
  static void
  ReqShutdownHandler(SIGNAL_ARGS)
  {
+ 	int save_errno = errno;
+ 
  	shutdown_requested = true;
+ 	if (MyProc)
+ 		SetLatch(&MyProc->procLatch);
+ 
+ 	errno = save_errno;
  }
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***************
*** 953,958 **** void
--- 953,959 ----
  MarkBufferDirty(Buffer buffer)
  {
  	volatile BufferDesc *bufHdr;
+ 	bool	dirtied = false;
  
  	if (!BufferIsValid(buffer))
  		elog(ERROR, "bad buffer ID: %d", buffer);
***************
*** 973,991 **** MarkBufferDirty(Buffer buffer)
  
  	Assert(bufHdr->refcount > 0);
  
  	/*
! 	 * If the buffer was not dirty already, do vacuum accounting.
  	 */
! 	if (!(bufHdr->flags & BM_DIRTY))
  	{
  		VacuumPageDirty++;
  		if (VacuumCostActive)
  			VacuumCostBalance += VacuumCostPageDirty;
  	}
- 
- 	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
- 
- 	UnlockBufHdr(bufHdr);
  }
  
  /*
--- 974,998 ----
  
  	Assert(bufHdr->refcount > 0);
  
+ 	if (!(bufHdr->flags & BM_DIRTY))
+ 		dirtied = true;
+ 
+ 	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
+ 
+ 	UnlockBufHdr(bufHdr);
+ 
  	/*
! 	 * If the buffer was not dirty already, do vacuum accounting, and
! 	 * nudge bgwriter.
  	 */
! 	if (dirtied)
  	{
  		VacuumPageDirty++;
  		if (VacuumCostActive)
  			VacuumCostBalance += VacuumCostPageDirty;
+ 		if (ProcGlobal->bgwriterLatch)
+ 			SetLatch(ProcGlobal->bgwriterLatch);
  	}
  }
  
  /*
***************
*** 1307,1314 **** BufferSync(int flags)
   * BgBufferSync -- Write out some dirty buffers in the pool.
   *
   * This is called periodically by the background writer process.
   */
! void
  BgBufferSync(void)
  {
  	/* info obtained from freelist.c */
--- 1314,1325 ----
   * BgBufferSync -- Write out some dirty buffers in the pool.
   *
   * This is called periodically by the background writer process.
+  *
+  * Returns a value indicating if the clocksweep has been "lapped", or if the
+  * bgwriter has been effectively disabled due to finding bgwriter_lru_maxpages
+  * at 0.
   */
! bool
  BgBufferSync(void)
  {
  	/* info obtained from freelist.c */
***************
*** 1365,1371 **** BgBufferSync(void)
  	if (bgwriter_lru_maxpages <= 0)
  	{
  		saved_info_valid = false;
! 		return;
  	}
  
  	/*
--- 1376,1382 ----
  	if (bgwriter_lru_maxpages <= 0)
  	{
  		saved_info_valid = false;
! 		return true;
  	}
  
  	/*
***************
*** 1584,1589 **** BgBufferSync(void)
--- 1595,1602 ----
  			 recent_alloc, strategy_delta, scans_per_alloc, smoothed_density);
  #endif
  	}
+ 
+ 	return bufs_to_lap == 0;
  }
  
  /*
***************
*** 2348,2353 **** SetBufferCommitInfoNeedsSave(Buffer buffer)
--- 2361,2369 ----
  			VacuumPageDirty++;
  			if (VacuumCostActive)
  				VacuumCostBalance += VacuumCostPageDirty;
+ 			/* The bgwriter may need to be woken. */
+ 			if (ProcGlobal->bgwriterLatch)
+ 				SetLatch(ProcGlobal->bgwriterLatch);
  		}
  		bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  		UnlockBufHdr(bufHdr);
*** a/src/include/storage/bufmgr.h
--- b/src/include/storage/bufmgr.h
***************
*** 213,219 **** extern bool HoldingBufferPinThatDelaysRecovery(void);
  extern void AbortBufferIO(void);
  
  extern void BufmgrCommit(void);
! extern void BgBufferSync(void);
  
  extern void AtProcExit_LocalBuffers(void);
  
--- 213,219 ----
  extern void AbortBufferIO(void);
  
  extern void BufmgrCommit(void);
! extern bool BgBufferSync(void);
  
  extern void AtProcExit_LocalBuffers(void);
  
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
***************
*** 188,193 **** typedef struct PROC_HDR
--- 188,195 ----
  	PGPROC	   *freeProcs;
  	/* Head of list of autovacuum's free PGPROC structures */
  	PGPROC	   *autovacFreeProcs;
+ 	/* BGWriter process latch */
+ 	Latch	   *bgwriterLatch;
  	/* Current shared estimate of appropriate spins_per_delay value */
  	int			spins_per_delay;
  	/* The proc of the Startup process, since not in ProcArray */
#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#5)
Re: BGWriter latch, power saving

On 17.01.2012 12:16, Heikki Linnakangas wrote:

On 04.01.2012 17:05, Peter Geoghegan wrote:

On 4 January 2012 07:24, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I think SetBufferCommitInfoNeedsSave() needs the same treatment as
MarkBufferDirty(). And it would probably be good to only set the
latch if
the buffer wasn't dirty already. Setting a latch that's already set
is fast,
but surely it's even faster to not even try.

That seems reasonable. Revised patch is attached.

Thanks! It occurs to me that it's still a bad idea to call SetLatch()
while holding the buffer header spinlock. At least when it's trivial to
just move the call after releasing the lock. See attached.

Could you do the sleeping/hibernating logic all in BgWriterNap()?

(sorry, forgot to update the above question before sending..)

In the patch I sent, I did rearrange the sleeping logic. I think it's
more readable the way it is now.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#7Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#6)
Re: BGWriter latch, power saving

On 17 January 2012 11:24, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

In the patch I sent, I did rearrange the sleeping logic. I think it's more
readable the way it is now.

I have no objection to either your refinement of the sleeping logic,
nor that you moved some things in both the existing code and my patch
so that they occur when no spinlock is held.

Should I proceed with a benchmark on V3, so that we can get this
committed? I imagine that a long pgbench-tools run is appropriate,
(after all, it was used to justify the re-write of the BGWriter for
8.3) at various scale factors, from smallish to quite large.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#7)
1 attachment(s)
Re: BGWriter latch, power saving

On 17.01.2012 14:38, Peter Geoghegan wrote:

On 17 January 2012 11:24, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

In the patch I sent, I did rearrange the sleeping logic. I think it's more
readable the way it is now.

I have no objection to either your refinement of the sleeping logic,
nor that you moved some things in both the existing code and my patch
so that they occur when no spinlock is held.

Ok, committed with some further cleanup.

Do you think the docs need to be updated for this, and if so, where? The
only place I found in the docs that speak about how the bgwriter works
is in config.sgml, where bgwriter_delay is described. Want to suggest an
update to that?

Should I proceed with a benchmark on V3, so that we can get this
committed? I imagine that a long pgbench-tools run is appropriate,
(after all, it was used to justify the re-write of the BGWriter for
8.3) at various scale factors, from smallish to quite large.

I did some testing on this, with a highly artificial test case that
dirties pages in shared_buffers as fast as possible. I tried to make it
a worst-case scenario, see attached script. I tested this with a 32-core
HP Itanium box, and on my 2-core laptop, and didn't see any measurable
slowdown from this patch. So I think we're good.

If setting the latch would become a contention issue, there would be a
pretty easy solution: only try to do it every 10 or 100 dirtied pages,
for example. A few dirty pages in the buffer cache don't mean anything,
as long as we kick the bgwriter in a fairly timely fashion when a larger
burst of activity begins.

BTW, do you have some sort of a test setup for these power-saving
patches, to actually measure the effect on number of interrupts or
electricity use? Fewer wakeups should be a good thing, but it would be
nice to see some figures to get an idea of how much progress we've done
and what still needs to be done.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

dirtypages.shapplication/x-sh; name=dirtypages.shDownload
#9Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#8)
1 attachment(s)
Re: BGWriter latch, power saving

On 26 January 2012 16:48, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Ok, committed with some further cleanup.

Do you think the docs need to be updated for this, and if so, where? The
only place I found in the docs that speak about how the bgwriter works is in
config.sgml, where bgwriter_delay is described. Want to suggest an update to
that?

This new behaviour might warrant a mention, as in the attached doc-patch.

I did some testing on this, with a highly artificial test case that dirties
pages in shared_buffers as fast as possible. I tried to make it a worst-case
scenario, see attached script. I tested this with a 32-core HP Itanium box,
and on my 2-core laptop, and didn't see any measurable slowdown from this
patch. So I think we're good.

Cool. I was pretty confident that it would be completely impossible to
show a regression under any circumstances, but I'm glad that you
tested it on a large server like that.

BTW, do you have some sort of a test setup for these power-saving patches,
to actually measure the effect on number of interrupts or electricity use?
Fewer wakeups should be a good thing, but it would be nice to see some
figures to get an idea of how much progress we've done and what still needs
to be done.

The only thing I've been using is Intel's powertop utility. It is
pretty easy to see what's happening, and what you'll see is exactly
what you'd expect (So by process, I could see that the bgwriter had
exactly 5 wake-ups per second with bgwriter_delay at 200). It is
useful to sleep quite pro-actively, as CPUs will enter idle C-states
and move to lower P-states quite quickly (some will be more familiar
with the commercial names for P-states, such as Intel's speedstep
technology). I might have been less conservative about the
circumstances that cause the bgwriter to go to sleep, but I was
conscious of the need to get something into this release.

It's difficult to know what a useful workload should be to show the
benefits of this work, apart from the obvious one, which is to show
Postgres when completely idle. I think we started at 11.5 wake-ups per
second, and I think that's down to about 6.5 now - the WAL Writer
still accounts for 5 of those, so that's why I feel it's particularly
important to get it done too, though obviously that's something that
should defer to however we end up implementing group commit.

I intend to blog about it in the next few days, and I'll present a
careful analysis of the benefits of this work there. Look out for it
on planet.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

bgwriter_docs.patchtext/x-patch; charset=US-ASCII; name=bgwriter_docs.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 309b6a5..0e4dd97
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** SET ENABLE_SEQSCAN TO OFF;
*** 1318,1334 ****
         </indexterm>
         <listitem>
          <para>
!          Specifies the delay between activity rounds for the
!          background writer.  In each round the writer issues writes
!          for some number of dirty buffers (controllable by the
!          following parameters).  It then sleeps for <varname>bgwriter_delay</>
!          milliseconds, and repeats.  The default value is 200 milliseconds
!          (<literal>200ms</>). Note that on many systems, the effective
!          resolution of sleep delays is 10 milliseconds; setting
!          <varname>bgwriter_delay</> to a value that is not a multiple of
!          10 might have the same results as setting it to the next higher
!          multiple of 10.  This parameter can only be set in the
!          <filename>postgresql.conf</> file or on the server command line.
          </para>
         </listitem>
        </varlistentry>
--- 1318,1335 ----
         </indexterm>
         <listitem>
          <para>
! 		 Specifies the delay between activity rounds for the background writer.
! 		 In each round the writer issues writes for some number of dirty buffers
! 		 (controllable by the following parameters).  It then sleeps for
! 		 <varname>bgwriter_delay</> milliseconds, and repeats, though it will
! 		 hibernate when it has looked through every buffer in
! 		 <varname>shared_buffers</varname> without finding a dirty buffer.  The
! 		 default value is 200 milliseconds (<literal>200ms</>). Note that on
! 		 many systems, the effective resolution of sleep delays is 10
! 		 milliseconds; setting <varname>bgwriter_delay</> to a value that is not
! 		 a multiple of 10 might have the same results as setting it to the next
! 		 higher multiple of 10.  This parameter can only be set in the
! 		 <filename>postgresql.conf</> file or on the server command line.
          </para>
         </listitem>
        </varlistentry>
#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#9)
Re: BGWriter latch, power saving

On 26.01.2012 21:37, Peter Geoghegan wrote:

On 26 January 2012 16:48, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Ok, committed with some further cleanup.

Do you think the docs need to be updated for this, and if so,
where? The only place I found in the docs that speak about how the
bgwriter works is in config.sgml, where bgwriter_delay is
described. Want to suggest an update to that?

This new behaviour might warrant a mention, as in the attached
doc-patch.

Hmm, the word "hibernate" might confuse people in this context. I
committed this as:

It then sleeps for <varname>bgwriter_delay</> milliseconds, and
repeats. When there are no dirty buffers in the buffer pool, though,
it goes into a longer sleep regardless of <varname>bgwriter_delay</>.

Thanks!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com