Reducing bgwriter wakeups

Started by Simon Riggsalmost 14 years ago7 messages
#1Simon Riggs
simon@2ndQuadrant.com
1 attachment(s)

Recent changes for power reduction mean that we now issue a wakeup
call to the bgwriter every time we set a hint bit.

However cheap that is, its still overkill.

My proposal is that we wakeup the bgwriter whenever a backend is
forced to write a dirty buffer, a job the bgwriter should have been
doing.

This significantly reduces the number of wakeup calls and allows the
bgwriter to stay asleep even when very light traffic happens, which is
good because the bgwriter is often the last process to sleep.

Seems useful to have an explicit discussion on this point, especially
in view of recent performance results.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

reducing_bgwriter_wakeups.v1.patchtext/x-diff; charset=US-ASCII; name=reducing_bgwriter_wakeups.v1.patchDownload
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 1adb6d3..310cd95 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -654,6 +654,10 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 				FlushBuffer(buf, NULL);
 				LWLockRelease(buf->content_lock);
 
+				/* The bgwriter may need to be woken. */
+				if (ProcGlobal->bgwriterLatch)
+					SetLatch(ProcGlobal->bgwriterLatch);
+
 				TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_DONE(forkNum, blockNum,
 											   smgr->smgr_rnode.node.spcNode,
 												smgr->smgr_rnode.node.dbNode,
@@ -2368,9 +2372,6 @@ SetBufferCommitInfoNeedsSave(Buffer buffer)
 			VacuumPageDirty++;
 			if (VacuumCostActive)
 				VacuumCostBalance += VacuumCostPageDirty;
-			/* The bgwriter may need to be woken. */
-			if (ProcGlobal->bgwriterLatch)
-				SetLatch(ProcGlobal->bgwriterLatch);
 		}
 	}
 }
#2Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#1)
Re: Reducing bgwriter wakeups

On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Recent changes for power reduction mean that we now issue a wakeup
call to the bgwriter every time we set a hint bit.

However cheap that is, its still overkill.

My proposal is that we wakeup the bgwriter whenever a backend is
forced to write a dirty buffer, a job the bgwriter should have been
doing.

This significantly reduces the number of wakeup calls and allows the
bgwriter to stay asleep even when very light traffic happens, which is
good because the bgwriter is often the last process to sleep.

Seems useful to have an explicit discussion on this point, especially
in view of recent performance results.

I don't see what this has to do with recent performance results, so
please elaborate. Off-hand, I don't see any point in getting cheap.
It seems far more important to me that the background writer become
active when needed than that we save some trivial amount of power by
waiting longer before activating it. If we're concerned about saving
power, then IMHO what we should be worried about is that the wal
writer is still waking up 5x/s.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#2)
Re: Reducing bgwriter wakeups

On Sun, Feb 19, 2012 at 8:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Recent changes for power reduction mean that we now issue a wakeup
call to the bgwriter every time we set a hint bit.

However cheap that is, its still overkill.

My proposal is that we wakeup the bgwriter whenever a backend is
forced to write a dirty buffer, a job the bgwriter should have been
doing.

This significantly reduces the number of wakeup calls and allows the
bgwriter to stay asleep even when very light traffic happens, which is
good because the bgwriter is often the last process to sleep.

Seems useful to have an explicit discussion on this point, especially
in view of recent performance results.

I don't see what this has to do with recent performance results, so
please elaborate.  Off-hand, I don't see any point in getting cheap.
It seems far more important to me that the background writer become
active when needed than that we save some trivial amount of power by
waiting longer before activating it.

Then you misunderstand, since I am advocating waking it when needed.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#4Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#3)
Re: Reducing bgwriter wakeups

On Sun, Feb 19, 2012 at 4:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Sun, Feb 19, 2012 at 8:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Recent changes for power reduction mean that we now issue a wakeup
call to the bgwriter every time we set a hint bit.

However cheap that is, its still overkill.

My proposal is that we wakeup the bgwriter whenever a backend is
forced to write a dirty buffer, a job the bgwriter should have been
doing.

This significantly reduces the number of wakeup calls and allows the
bgwriter to stay asleep even when very light traffic happens, which is
good because the bgwriter is often the last process to sleep.

Seems useful to have an explicit discussion on this point, especially
in view of recent performance results.

I don't see what this has to do with recent performance results, so
please elaborate.  Off-hand, I don't see any point in getting cheap.
It seems far more important to me that the background writer become
active when needed than that we save some trivial amount of power by
waiting longer before activating it.

Then you misunderstand, since I am advocating waking it when needed.

Well, I guess that depends on when it's actually needed. You haven't
presented any evidence one way or the other.

I mean, let's suppose that a sudden spike of activity hits a
previously-idle system. If we wait until all of shared_buffers is
dirty before waking up the background writer, it seems possible that
the background writer is going to have a hard time catching up. If we
wake it immediately, we don't have that problem.

Also, in general, I think that it's not a good idea to let dirty data
sit in shared_buffers forever. I'm unhappy about the change this
release cycle to skip checkpoints if we've written less than a full
WAL segment, and this seems like another step in that direction. It's
exposing us to needless risk of data loss. In 9.1, if you process a
transaction and, an hour later, the disk where pg_xlog is written
melts into a heap of molten slag, your transaction will be there, even
if you end up having to run pg_resetxlog. In 9.2, it may well be that
xlog contains the only record of that transaction, and you're hosed.
The more work we do to postpone writing the data until the absolutely
last possible moment, the more likely it is that it won't be on disk
when we need it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#4)
Re: Reducing bgwriter wakeups

On Sun, Feb 19, 2012 at 2:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Also, in general, I think that it's not a good idea to let dirty data
sit in shared_buffers forever.  I'm unhappy about the change this
release cycle to skip checkpoints if we've written less than a full
WAL segment, and this seems like another step in that direction.  It's
exposing us to needless risk of data loss.  In 9.1, if you process a
transaction and, an hour later, the disk where pg_xlog is written
melts into a heap of molten slag, your transaction will be there, even
if you end up having to run pg_resetxlog.

Would the log really have been archived in 9.1? I don't think
checkpoint_timeout caused a log switch, just a checkpoint which could
happily be in the same file as the previous checkpoint.

In 9.2, it may well be that
xlog contains the only record of that transaction, and you're hosed.
The more work we do to postpone writing the data until the absolutely
last possible moment, the more likely it is that it won't be on disk
when we need it.

Isn't that what archive_timeut is for?

Should archive_timeout default to something like 5 min, rather than 0?

Cheers,

Jeff

#6Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#5)
Re: Reducing bgwriter wakeups

On Sun, Feb 19, 2012 at 5:56 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Would the log really have been archived in 9.1?  I don't think
checkpoint_timeout caused a log switch, just a checkpoint which could
happily be in the same file as the previous checkpoint.

The log segment doesn't need to get archived - it's sufficient that
the dirty buffers get written to disk.

In 9.2, it may well be that
xlog contains the only record of that transaction, and you're hosed.
The more work we do to postpone writing the data until the absolutely
last possible moment, the more likely it is that it won't be on disk
when we need it.

Isn't that what archive_timeut is for?

Should archive_timeout default to something like 5 min, rather than 0?

I dunno. I think people are doing replication are probably mostly
using streaming replication these days, in which case archive_timeout
won't matter one way or the other. But if you're not doing
replication, your only hope of recovering from a trashed pg_xlog is
that PostgreSQL wrote the buffers and (in the case of an OS crash) the
OS wrote them to disk.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#4)
Re: Reducing bgwriter wakeups

On 20.02.2012 00:18, Robert Haas wrote:

On Sun, Feb 19, 2012 at 4:11 PM, Simon Riggs<simon@2ndquadrant.com> wrote:

On Sun, Feb 19, 2012 at 8:15 PM, Robert Haas<robertmhaas@gmail.com> wrote:

On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs<simon@2ndquadrant.com> wrote:

Recent changes for power reduction mean that we now issue a wakeup
call to the bgwriter every time we set a hint bit.

However cheap that is, its still overkill.

My proposal is that we wakeup the bgwriter whenever a backend is
forced to write a dirty buffer, a job the bgwriter should have been
doing.

This significantly reduces the number of wakeup calls and allows the
bgwriter to stay asleep even when very light traffic happens, which is
good because the bgwriter is often the last process to sleep.

That seems like swinging the pendulum too much in the other direction,
as others have noted. A simple thing you could do, however, is to only
wake up bgwriter every 10 dirtied pages in the backend or something like
that. That would reduce the wakeups by a factor of 10. Would that be
useful? It's not actually clear to me what the problem you're trying to
solve is.

Seems useful to have an explicit discussion on this point, especially
in view of recent performance results.

I don't see what this has to do with recent performance results, so
please elaborate. Off-hand, I don't see any point in getting cheap.
It seems far more important to me that the background writer become
active when needed than that we save some trivial amount of power by
waiting longer before activating it.

Then you misunderstand, since I am advocating waking it when needed.

Well, I guess that depends on when it's actually needed. You haven't
presented any evidence one way or the other.

I mean, let's suppose that a sudden spike of activity hits a
previously-idle system. If we wait until all of shared_buffers is
dirty before waking up the background writer, it seems possible that
the background writer is going to have a hard time catching up. If we
wake it immediately, we don't have that problem.

Well, as long as the OS has some clean buffers, as it presumably does if
the system has been idle for a while, bgwriter will catch up very
quickly by simply dumping a large number of dirty pages to the OS. Also,
as the code stands, bgwriter still wakes up every 10 seconds even when
no-one signals it, which makes this a much less likely to happen.

Nevertheless, I also feel that it would be better for bgwriter to be a
bit more proactive than that.

Also, in general, I think that it's not a good idea to let dirty data
sit in shared_buffers forever. I'm unhappy about the change this
release cycle to skip checkpoints if we've written less than a full
WAL segment, and this seems like another step in that direction. It's
exposing us to needless risk of data loss. In 9.1, if you process a
transaction and, an hour later, the disk where pg_xlog is written
melts into a heap of molten slag, your transaction will be there, even
if you end up having to run pg_resetxlog. In 9.2, it may well be that
xlog contains the only record of that transaction, and you're hosed.
The more work we do to postpone writing the data until the absolutely
last possible moment, the more likely it is that it won't be on disk
when we need it.

True. (but as noted above, bgwriter still wakes up every 10 seconds so
this isn't really an issue at the moment)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com