ACM Paper relevant to our buffer algorithm
Incidentally I found this paper in ACM SIGMETRICS 1992 covering more or less
precisely the same algorithm we're using for our clock sweep. I haven't quite
digested it yet myself so I'm not sure what the conclusions about weights tell
us to do with our buffer usage counter.
I put a copy up for download since even though it's permitted to copy I don't
know how to find a public link. It's kind of big so please download it and
read it locally, don't try to read it from my machine:
http://stark.xeocode.com/~stark/p35-nicola.pdf.gz
Regarding copyright the paper sez:
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the ACM copyright notice and the
title of the publication and its date appear, and notice is given
that copying is by permission of ths Association for Computing
Machinery. To copy otherwise, or to republish, requires a fee
and/or specific permission.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Here are some more recent papers that also give good insight into research
in this area:
http://www.cs.usask.ca/~wew036/comprehensive.pdf
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote:
Here are some more recent papers that also give good insight into
research in this area:http://www.cs.usask.ca/~wew036/comprehensive.pdf
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf
Nice papers.
What I'd like to see is a paper on precleaning strategies. I tried to
google for one but couldn't find any. That would be very relevant to the
bgwriter tuning discussions.
I'm still struggling to understand why and how bgwriter increases
performance. Under what circumstances, what workload?
The only benefit I can see is that it moves the write() of a page out of
the critical path. But as long as the OS cache can absorb the write, it
should be very cheap compared to doing real I/O. Apparently the workload
that benefits most is an OLTP workload where response times are
critical, on a database that doesn't fit in share_buffers, but fits in
OS cache.
Am I missing something? What kind of a test would show the most benefit?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Wed, Jul 04, 2007 at 11:09:19AM +0100, Heikki Linnakangas wrote:
The only benefit I can see is that it moves the write() of a page out of
the critical path. But as long as the OS cache can absorb the write, it
should be very cheap compared to doing real I/O. Apparently the workload
that benefits most is an OLTP workload where response times are
critical, on a database that doesn't fit in share_buffers, but fits in
OS cache.
I thought the point was to make checkpoints cheaper. Sure, the OS can
probably absorb the write() but you execute a fsync() shortly after so
you're going to block on that. The point being that by executing the
writes earlier you can get some of the writing done in the bakcground
prior to the fsync.
So it would be targetting people with lots of dirty shared buffers
where a checkpoint is going to eat your I/O bandwidth. An fsync will
make the OS write everything ASAP.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote:
On Wed, Jul 04, 2007 at 11:09:19AM +0100, Heikki Linnakangas wrote:
The only benefit I can see is that it moves the write() of a page out of
the critical path. But as long as the OS cache can absorb the write, it
should be very cheap compared to doing real I/O. Apparently the workload
that benefits most is an OLTP workload where response times are
critical, on a database that doesn't fit in share_buffers, but fits in
OS cache.I thought the point was to make checkpoints cheaper. Sure, the OS can
probably absorb the write() but you execute a fsync() shortly after so
you're going to block on that. The point being that by executing the
writes earlier you can get some of the writing done in the bakcground
prior to the fsync.
That was the purpose of the bgwriter "all"-scan. It was removed as part
of the load distributed checkpoints patch, as it's not needed anymore.
What's the purpose of the lru-scan?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes:
Greg Smith wrote:
Here are some more recent papers that also give good insight into research in
this area:
http://www.cs.usask.ca/~wew036/comprehensive.pdf
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdfNice papers.
What I'd like to see is a paper on precleaning strategies. I tried to google
for one but couldn't find any. That would be very relevant to the bgwriter
tuning discussions.
I saw some listed but they were for VM managers so they may not be perfectly
adaptable.
I'm still struggling to understand why and how bgwriter increases performance.
Under what circumstances, what workload?The only benefit I can see is that it moves the write() of a page out of the
critical path. But as long as the OS cache can absorb the write, it should be
very cheap compared to doing real I/O.
Well you can't keep writing indefinitely faster than the i/o subsystem can
execute the writes. Eventually the kernel will block your write until a kernel
buffer becomes free. Ie, throttle your writes to the actual write bandwidth
available.
I think most systems are limited by the latency of reads though. Reads can't
be re-ordered as well as writes can, they happen synchronously as far as an
individual backend is concerned.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Gregory Stark <gsstark@mit.edu> writes:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
I'm still struggling to understand why and how bgwriter increases performance.
Under what circumstances, what workload?The only benefit I can see is that it moves the write() of a page out of the
critical path. But as long as the OS cache can absorb the write, it should be
very cheap compared to doing real I/O.
Well you can't keep writing indefinitely faster than the i/o subsystem can
execute the writes. Eventually the kernel will block your write until a kernel
buffer becomes free. Ie, throttle your writes to the actual write bandwidth
available.
Right. Also, a buffer write isn't "merely" a kernel call --- for
instance, you might have to flush some more WAL before you can execute
the write, and there are cases where you'd have to fsync the write
yourself (ie, if you can't pass it off to the bgwriter). The more of
that we can take out of foreground query paths, the better.
regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
Gregory Stark <gsstark@mit.edu> writes:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
I'm still struggling to understand why and how bgwriter increases performance.
Under what circumstances, what workload?The only benefit I can see is that it moves the write() of a page out of the
critical path. But as long as the OS cache can absorb the write, it should be
very cheap compared to doing real I/O.Well you can't keep writing indefinitely faster than the i/o subsystem can
execute the writes. Eventually the kernel will block your write until a kernel
buffer becomes free. Ie, throttle your writes to the actual write bandwidth
available.Right. Also, a buffer write isn't "merely" a kernel call --- for
instance, you might have to flush some more WAL before you can execute
the write, and there are cases where you'd have to fsync the write
yourself (ie, if you can't pass it off to the bgwriter). The more of
that we can take out of foreground query paths, the better.
So it sounds like a good place to start to try to benchmark something where
bgwriter helps might be a setup which starts with a very large table (like
1-10M rows) and each transaction deletes a large number of random tuples (~ 1k
rows). Possibly waiting briefly before committing to give a chance for the
dirty pages to be needed before commit flushes the wal.
That way each transaction dirties a large number of pages and the next
transaction is likely to need a fresh page and find one which has been dirtied
and not had its wal record flushed. Deletes mean not much wal will be
generated so wal won't be a bottleneck and you won't get checkpoints due to
checkpoint_segments being reached.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com