Spread checkpoint sync

Started by Greg Smithover 15 years ago125 messageshackers
Jump to latest
#1Greg Smith
gsmith@gregsmith.com

Final patch in this series for today spreads out the individual
checkpoint fsync calls over time, and was written by myself and Simon
Riggs. Patch is based against a system that's already had the two
patches I sent over earlier today applied, rather than HEAD, as both are
useful for measuring how well this one works. You can grab a tree with
all three from my Github repo, via the "checkpoint" branch:
https://github.com/greg2ndQuadrant/postgres/tree/checkpoint

This is a work in progress. While I've seen this reduce checkpoint
spike latency significantly on a large system, I don't have any
referencable performance numbers I can share yet. There are also a
couple of problems I know about, and I'm sure others I haven't thought
of yet The first known issues is that it delays manual or other
"forced" checkpoints, which is not necessarily wrong if you really are
serious about spreading syncs out, but it is certainly surprising when
you run into it. I notice this most when running createdb on a busy
system. No real reason for this to happen, the code passes that it's a
forced checkpoint down but just doesn't act on it yet.

The second issue is that the delay between sync calls is currently
hard-coded, at 3 seconds. I believe the right path here is to consider
the current checkpoint_completion_target to still be valid, then work
back from there. That raises the question of what percentage of the
time writes should now be compressed into relative to that, to leave
some time to spread the sync calls. If we're willing to say "writes
finish in first 1/2 of target, syncs execute in second 1/2", that I
could implement that here. Maybe that ratio needs to be another
tunable. Still thinking about that part, and it's certainly open to
community debate. The thing to realize that complicates the design is
that the actual sync execution may take a considerable period of time.
It's much more likely for that to happen than in the case of an
individual write, as the current spread checkpoint does, because those
are usually cached. In the spread sync case, it's easy for one slow
sync to make the rest turn into ones that fire in quick succession, to
make up for lost time.

There's some history behind this design that impacts review. Circa 8.3
development in 2007, I had experimented with putting some delay between
each of the fsync calls that the background writer executes during a
checkpoint. It didn't help smooth things out at all at the time. It
turns out that's mainly because all my tests were on Linux using ext3.
On that filesystem, fsync is not very granular. It's quite likely it
will push out data you haven't asked to sync yet, which means one giant
sync is almost impossible to avoid no matter how you space the fsync
calls. If you try and review this on ext3, I expect you'll find a big
spike early in each checkpoint (where it flushes just about everything
out) and then quick response for the later files involved.

The system this patch originated to help fix was running XFS. There,
I've confirmed that problem doesn't exist, that individual syncs only
seem to push out the data related to one file. The same should be true
on ext4, but I haven't tested that myself. Not sure how granular the
fsync calls are on Solaris, FreeBSD, Darwin, etc. yet. Note that it's
still possible to get hung on one sync call for a while, even on XFS.
The worst case seems to be if you've created a new 1GB database table
chunk and fully populated it since the last checkpoint, on a system
that's just cached the whole thing so far.

One change that turned out be necessary rather than optional--to get
good performance from the system under tuning--was to make regular
background writer activity, including fsync absorb checks, happen during
these sync pauses. The existing code ran the checkpoint sync work in a
pretty tight loop, which as I alluded to in an earlier patch today can
lead to the backends competing with the background writer to get their
sync calls executed. This squashes that problem if the background
writer is setup properly.

What does properly mean? Well, it can't do that cleanup if the
background writer is sleeping. This whole area was refactored. The
current sync absorb code uses the constant WRITES_PER_ABSORB to make
decisions. This new version replaces that hard-coded value with
something that scales to the system size. It now ignores doing work
until the number of pending absorb requests has reached 10% of the
number possible to store (BgWriterShmem->max_requests, which is set to
the size of shared_buffers in 8K pages, AKA NBuffers). This may
actually postpone this work for too long on systems with large
shared_buffers settings; that's one area I'm still investigating.

As far as concerns about this 10% setting not doing enough work, which
is something I do see, you can always increase how often absorbing
happens by decreasing bgwriter_delay now--giving other benefits too.
For example, if you run the fsync-stress-v2.sh script I included with
the last patch I sent, you'll discover the spread sync version of the
server leaves just as many unabsorbed writes behind as the old code
did. Those are happening because of periods the background writer is
sleeping. They drop as you decrease the delay; here's a table showing
some values I tested here, with all three patches installed:

bgwriter_delay buffers_backend_sync
200 ms 90
50 ms 28
25 ms 3

There's a bunch of performance related review work that needs to be done
here, in addition to the usual code review for the patch. My hope is
that I can get enough of that done to validate this does what it's
supposed to on public hardware that a later version of this patch is
considered for the next CommitFest. It's a little more raw than I'd
like still, but the idea has been tested enough here that I believe it's
fundamentally sound and valuable.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

Attachments:

sync-spread-v2.patchtext/x-patch; name=sync-spread-v2.patchDownload+93-35
#2Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#1)
Re: Spread checkpoint sync

On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith <greg@2ndquadrant.com> wrote:

The second issue is that the delay between sync calls is currently
hard-coded, at 3 seconds.  I believe the right path here is to consider the
current checkpoint_completion_target to still be valid, then work back from
there.  That raises the question of what percentage of the time writes
should now be compressed into relative to that, to leave some time to spread
the sync calls. If we're willing to say "writes finish in first 1/2 of
target, syncs execute in second 1/2", that I could implement that here.
 Maybe that ratio needs to be another tunable.  Still thinking about that
part, and it's certainly open to community debate.  The thing to realize
that complicates the design is that the actual sync execution may take a
considerable period of time.  It's much more likely for that to happen than
in the case of an individual write, as the current spread checkpoint does,
because those are usually cached.  In the spread sync case, it's easy for
one slow sync to make the rest turn into ones that fire in quick succession,
to make up for lost time.

I think the behavior of file systems and operating systems is highly
relevant here. We seem to have a theory that allowing a delay between
the write and the fsync should give the OS a chance to start writing
the data out, but do we have any evidence indicating whether and under
what circumstances that actually occurs? For example, if we knew that
it's important to wait at least 30 s but waiting 60 s is no better,
that would be useful information.

Another question I have is about how we're actually going to know when
any given fsync can be performed. For any given segment, there are a
certain number of pages A that are already dirty at the start of the
checkpoint. Then there are a certain number of additional pages B
that are going to be written out during the checkpoint. If it so
happens that B = 0, we can call fsync() at the beginning of the
checkpoint without losing anything (in fact, we gain something: any
pages dirtied by cleaning scans or backend writes during the
checkpoint won't need to hit the disk; and if the filesystem dumps
more of its cache than necessary on fsync, we may as well take that
hit before dirtying a bunch more stuff). But if B > 0, then we should
attempt the fsync() until we've written them all; otherwise we'll end
up having to fsync() that segment twice.

Doing all the writes and then all the fsyncs meets this requirement
trivially, but I'm not so sure that's a good idea. For example, given
files F1 ... Fn with dirty pages needing checkpoint writes, we could
do the following: first, do any pending fsyncs for files not among F1
.. Fn; then, write all pages for F1 and fsync, write all pages for F2
and fsync, write all pages for F3 and fsync, etc. This might seem
dumb because we're not really giving the OS a chance to write anything
out before we fsync, but think about the ext3 case where the whole
filesystem cache gets flushed anyway. It's much better to dump the
cache at the beginning of the checkpoint and then again after every
file than it is to spew many GB of dirty stuff into the cache and then
drop the hammer.

I'm just brainstorming here; feel free to tell me I'm all wet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#2)
Re: Spread checkpoint sync

On Mon, Nov 15, 2010 at 6:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith <greg@2ndquadrant.com> wrote:

The second issue is that the delay between sync calls is currently
hard-coded, at 3 seconds.  I believe the right path here is to consider the
current checkpoint_completion_target to still be valid, then work back from
there.  That raises the question of what percentage of the time writes
should now be compressed into relative to that, to leave some time to spread
the sync calls.  If we're willing to say "writes finish in first 1/2 of
target, syncs execute in second 1/2", that I could implement that here.
 Maybe that ratio needs to be another tunable.  Still thinking about that
part, and it's certainly open to community debate.

I would speculate that the answer is likely to be nearly binary. The
best option would either be to do the writes as fast as possible and
spread out the fsyncs, or spread out the writes and do the fsyncs as
fast as possible. Depending on the system set up.

 The thing to realize
that complicates the design is that the actual sync execution may take a
considerable period of time.  It's much more likely for that to happen than
in the case of an individual write, as the current spread checkpoint does,
because those are usually cached.  In the spread sync case, it's easy for
one slow sync to make the rest turn into ones that fire in quick succession,
to make up for lost time.

I think the behavior of file systems and operating systems is highly
relevant here.  We seem to have a theory that allowing a delay between
the write and the fsync should give the OS a chance to start writing
the data out,

I thought that the theory was that doing too many fsync in short order
can lead to some kind of starvation of other IO.

If the theory is that we want to wait between writes and fsyncs, then
the current behavior is probably the best, Spreading out the writes
and then doing all the syncs at the end gives the best delay time
between an average write and the sync of that written to file. Or,
spread the writes out over 150 seconds, sleep for 140 seconds, then do
the fsyncs. But I don't think that that is the theory.

but do we have any evidence indicating whether and under
what circumstances that actually occurs?  For example, if we knew that
it's important to wait at least 30 s but waiting 60 s is no better,
that would be useful information.

Another question I have is about how we're actually going to know when
any given fsync can be performed.  For any given segment, there are a
certain number of pages A that are already dirty at the start of the
checkpoint.

Dirty in the shared pool, or dirty in the OS cache?

Then there are a certain number of additional pages B
that are going to be written out during the checkpoint.  If it so
happens that B = 0, we can call fsync() at the beginning of the
checkpoint without losing anything (in fact, we gain something: any
pages dirtied by cleaning scans or backend writes during the
checkpoint won't need to hit the disk;

Aren't those pages written out by cleaning scans and backend writes
while the checkpoint is occurring exactly what you defined to be page
set B, and then to be zero?

and if the filesystem dumps
more of its cache than necessary on fsync, we may as well take that
hit before dirtying a bunch more stuff).  But if B > 0, then we should
attempt the fsync() until we've written them all; otherwise we'll end
up having to fsync() that segment twice.

Doing all the writes and then all the fsyncs meets this requirement
trivially, but I'm not so sure that's a good idea.  For example, given
files F1 ... Fn with dirty pages needing checkpoint writes, we could
do the following: first, do any pending fsyncs for files not among F1
.. Fn; then, write all pages for F1 and fsync, write all pages for F2
and fsync, write all pages for F3 and fsync, etc.  This might seem
dumb because we're not really giving the OS a chance to write anything
out before we fsync, but think about the ext3 case where the whole
filesystem cache gets flushed anyway.  It's much better to dump the
cache at the beginning of the checkpoint and then again after every
file than it is to spew many GB of dirty stuff into the cache and then
drop the hammer.

But the kernel has knobs to prevent that from happening.
dirty_background_ratio, dirty_ratio, dirty_background_bytes (on newer
kernels), dirty_expire_centisecs. Don't these knobs work? Also, ext3
is supposed to do a journal commit every 5 seconds under default mount
conditions.

Cheers,

Jeff

#4Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#3)
Re: Spread checkpoint sync

On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

 The thing to realize
that complicates the design is that the actual sync execution may take a
considerable period of time.  It's much more likely for that to happen than
in the case of an individual write, as the current spread checkpoint does,
because those are usually cached.  In the spread sync case, it's easy for
one slow sync to make the rest turn into ones that fire in quick succession,
to make up for lost time.

I think the behavior of file systems and operating systems is highly
relevant here.  We seem to have a theory that allowing a delay between
the write and the fsync should give the OS a chance to start writing
the data out,

I thought that the theory was that doing too many fsync in short order
can lead to some kind of starvation of other IO.

If the theory is that we want to wait between writes and fsyncs, then
the current behavior is probably the best, Spreading out the writes
and then doing all the syncs at the end gives the best delay time
between an average write and the sync of that written to file.  Or,
spread the writes out over 150 seconds, sleep for 140 seconds, then do
the fsyncs.  But I don't think that that is the theory.

Well, I've heard Bruce and, I think, possibly also Greg talk about
wanting to wait after doing the writes in the hopes that the kernel
will start to flush the dirty pages, but I'm wondering whether it
wouldn't be better to just give up on that and do: small batch of
writes - fsync those writes - another small batch of writes - fsync
that batch - etc.

but do we have any evidence indicating whether and under
what circumstances that actually occurs?  For example, if we knew that
it's important to wait at least 30 s but waiting 60 s is no better,
that would be useful information.

Another question I have is about how we're actually going to know when
any given fsync can be performed.  For any given segment, there are a
certain number of pages A that are already dirty at the start of the
checkpoint.

Dirty in the shared pool, or dirty in the OS cache?

OS cache, sorry.

Then there are a certain number of additional pages B
that are going to be written out during the checkpoint.  If it so
happens that B = 0, we can call fsync() at the beginning of the
checkpoint without losing anything (in fact, we gain something: any
pages dirtied by cleaning scans or backend writes during the
checkpoint won't need to hit the disk;

Aren't those pages written out by cleaning scans and backend writes
while the checkpoint is occurring exactly what you defined to be page
set B, and then to be zero?

No, sorry, I'm referring to cases where all the dirty pages in a
segment have been written out the OS but we have not yet issued the
necessary fsync.

and if the filesystem dumps
more of its cache than necessary on fsync, we may as well take that
hit before dirtying a bunch more stuff).  But if B > 0, then we should
attempt the fsync() until we've written them all; otherwise we'll end
up having to fsync() that segment twice.

Doing all the writes and then all the fsyncs meets this requirement
trivially, but I'm not so sure that's a good idea.  For example, given
files F1 ... Fn with dirty pages needing checkpoint writes, we could
do the following: first, do any pending fsyncs for files not among F1
.. Fn; then, write all pages for F1 and fsync, write all pages for F2
and fsync, write all pages for F3 and fsync, etc.  This might seem
dumb because we're not really giving the OS a chance to write anything
out before we fsync, but think about the ext3 case where the whole
filesystem cache gets flushed anyway.  It's much better to dump the
cache at the beginning of the checkpoint and then again after every
file than it is to spew many GB of dirty stuff into the cache and then
drop the hammer.

But the kernel has knobs to prevent that from happening.
dirty_background_ratio, dirty_ratio, dirty_background_bytes (on newer
kernels), dirty_expire_centisecs.  Don't these knobs work?  Also, ext3
is supposed to do a journal commit every 5 seconds under default mount
conditions.

I don't know in detail. dirty_expire_centisecs sounds useful; I think
the problem with dirty_background_ratio and dirty_ratio is that the
default ratios are large enough that on systems with a huge pile of
memory, they allow more dirty data to accumulate than can be flushed
without causing an I/O storm. I believe Greg Smith made a comment
along the lines of - memory sizes are grow faster than I/O speeds;
therefore a ratio that is OK for a low-end system with a modest amount
of memory causes problems on a high-end system that has faster I/O but
MUCH more memory.

As a kernel developer, I suspect the tendency is to try to set the
ratio so that you keep enough free memory around to service future
allocation requests. Optimizing for possible future fsyncs is
probably not the top priority...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#4)
Re: Spread checkpoint sync

On Sat, Nov 20, 2010 at 5:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Doing all the writes and then all the fsyncs meets this requirement
trivially, but I'm not so sure that's a good idea.  For example, given
files F1 ... Fn with dirty pages needing checkpoint writes, we could
do the following: first, do any pending fsyncs for files not among F1
.. Fn; then, write all pages for F1 and fsync, write all pages for F2
and fsync, write all pages for F3 and fsync, etc.  This might seem
dumb because we're not really giving the OS a chance to write anything
out before we fsync, but think about the ext3 case where the whole
filesystem cache gets flushed anyway.  It's much better to dump the
cache at the beginning of the checkpoint and then again after every
file than it is to spew many GB of dirty stuff into the cache and then
drop the hammer.

But the kernel has knobs to prevent that from happening.
dirty_background_ratio, dirty_ratio, dirty_background_bytes (on newer
kernels), dirty_expire_centisecs.  Don't these knobs work?  Also, ext3
is supposed to do a journal commit every 5 seconds under default mount
conditions.

I don't know in detail.  dirty_expire_centisecs sounds useful; I think
the problem with dirty_background_ratio and dirty_ratio is that the
default ratios are large enough that on systems with a huge pile of
memory, they allow more dirty data to accumulate than can be flushed
without causing an I/O storm.

True, but I think that changing these from their defaults is not
considered to be a dark art reserved for kernel hackers, i.e they are
something that sysadmins are expected to tweak to suite their work
load, just like the shmmax and such. And for very large memory
systems, even 1% may be too much to cache (dirty*_ratio can only be
set in integer percent points), so recent kernels introduced
dirty*_bytes parameters. I like these better because they do what
they say. With the dirty*_ratio, I could never figure out what it was
a ratio of, and the results were unpredictable without extensive
experimentation.

I believe Greg Smith made a comment
along the lines of - memory sizes are grow faster than I/O speeds;
therefore a ratio that is OK for a low-end system with a modest amount
of memory causes problems on a high-end system that has faster I/O but
MUCH more memory.

Yes, but how much work do we want to put into redoing the checkpoint
logic so that the sysadmin on a particular OS and configuration and FS
can avoid having to change the kernel parameters away from their
defaults? (Assuming of course I am correctly understanding the
problem, always a dangerous assumption.)

Some experiments I have just done show that dirty_expire_centisecs
does not seem reliable on ext3, but the dirty*_ratio and dirty*_bytes
seem reliable on ext2, ext3, and ext4.

But that may not apply to RAID, I don't have one I can test.

Cheers,

Jeff

#6Greg Smith
gsmith@gregsmith.com
In reply to: Jeff Janes (#5)
Re: Spread checkpoint sync

Jeff Janes wrote:

And for very large memory
systems, even 1% may be too much to cache (dirty*_ratio can only be
set in integer percent points), so recent kernels introduced
dirty*_bytes parameters. I like these better because they do what
they say. With the dirty*_ratio, I could never figure out what it was
a ratio of, and the results were unpredictable without extensive
experimentation.

Right, you can't set dirty_background_ratio low enough to make this
problem go away. Even attempts to set it to 1%, back when that that was
the right size for it, seem to be defeated by other mechanisms within
the kernel. Last time I looked at the related source code, it seemed
the "congestion control" logic that kicks in to throttle writes was a
likely suspect. This is why I'm not real optimistic about newer
mechanism like the dirty_background_bytes added 2.6.29 to help here, as
that just gives a mapping to setting lower values; the same basic logic
is under the hood.

Like Jeff, I've never seen dirty_expire_centisecs help at all, possibly
due to the same congestion mechanism.

Yes, but how much work do we want to put into redoing the checkpoint
logic so that the sysadmin on a particular OS and configuration and FS
can avoid having to change the kernel parameters away from their
defaults? (Assuming of course I am correctly understanding the
problem, always a dangerous assumption.)

I've been trying to make this problem go away using just the kernel
tunables available since 2006. I adjusted them carefully on the server
that ran into this problem so badly that it motivated the submitted
patch, months before this issue got bad. It didn't help. Maybe if they
were running a later kernel that supported dirty_background_bytes that
would have worked better. During the last few years, the only thing
that has consistently helped in every case is the checkpoint spreading
logic that went into 8.3. I no longer expect that the kernel developers
will ever make this problem go away the way checkpoints are written out
right now, whereas the last good PostgreSQL work in this area definitely
helped.

The basic premise of the current checkpoint code is that if you write
all of the buffers out early enough, by the time syncs execute enough of
the data should have gone out that those don't take very long to
process. That was usually true for the last few years, on systems with
a battery-backed cache; the amount of memory cached by the OS was
relatively small relative to the RAID cache size. That's not the case
anymore, and that divergence is growing bigger.

The idea that the checkpoint sync code can run in a relatively tight
loop, without stopping to do the normal background writer cleanup work,
is also busted by that observation.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

#7Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#2)
Re: Spread checkpoint sync

Robert Haas wrote:

Doing all the writes and then all the fsyncs meets this requirement
trivially, but I'm not so sure that's a good idea. For example, given
files F1 ... Fn with dirty pages needing checkpoint writes, we could
do the following: first, do any pending fsyncs for files not among F1
.. Fn; then, write all pages for F1 and fsync, write all pages for F2
and fsync, write all pages for F3 and fsync, etc. This might seem
dumb because we're not really giving the OS a chance to write anything
out before we fsync, but think about the ext3 case where the whole
filesystem cache gets flushed anyway.

I'm not horribly interested in optimizing for the ext3 case per se, as I
consider that filesystem fundamentally broken from the perspective of
its ability to deliver low-latency here. I wouldn't want a patch that
improved behavior on filesystem with granular fsync to make the ext3
situation worst. That's as much as I'd want design to lean toward
considering its quirks. Jeff Janes made a case downthread for "why not
make it the admin/OS's job to worry about this?" In cases where there
is a reasonable solution available, in the form of "switch to XFS or
ext4", I'm happy to take that approach.

Let me throw some numbers out to give a better idea of the shape and
magnitude of the problem case I've been working on here. In the
situation that leads that the near hour-long sync phase I've seen,
checkpoints will start with about a 3GB backlog of data in the kernel
write cache to deal with. That's about 4% of RAM, just under the 5%
threshold set by dirty_background_ratio. Whether or not the 256MB write
cache on the controller is also filled is a relatively minor detail I
can't monitor easily. The checkpoint itself? <250MB each time.

This proportion is why I didn't think to follow the alternate path of
worrying about spacing the write and fsync calls out differently. I
shrunk shared_buffers down to make the actual checkpoints smaller, which
helped to some degree; that's what got them down to smaller than the
RAID cache size. But the amount of data cached by the operating system
is the real driver of total sync time here. Whether or not you include
all of the writes from the checkpoint itself before you start calling
fsync didn't actually matter very much; in the case I've been chasing,
those are getting cached anyway. The write storm from the fsync calls
themselves forcing things out seems to be the driver on I/O spikes,
which is why I started with spacing those out.

Writes go out at a rate of around 5MB/s, so clearing the 3GB backlog
takes a minimum of 10 minutes of real time. There are about 300 1GB
relation files involved in the case I've been chasing. This is where
the 3 second delay number came from; 300 files, 3 seconds each, 900
seconds = 15 minutes of sync spread. You can turn that math around to
figure out how much delay per relation you can afford while still
keeping checkpoints to a planned end time, which isn't done in the patch
I submitted yet.

Ultimately what I want to do here is some sort of smarter write-behind
sync operation, perhaps with a LRU on relations with pending fsync
requests. The idea would be to sync relations that haven't been touched
in a while in advance of the checkpoint even. I think that's similar to
the general idea Robert is suggesting here, to get some sync calls
flowing before all of the checkpoint writes have happened. I think that
the final sync calls will need to get spread out regardless, and since
doing that requires a fairly small amount of code too that's why we
started with that.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

#8Martijn van Oosterhout
kleptog@svana.org
In reply to: Greg Smith (#7)
Re: Spread checkpoint sync

On Sun, Nov 21, 2010 at 04:54:00PM -0500, Greg Smith wrote:

Ultimately what I want to do here is some sort of smarter write-behind
sync operation, perhaps with a LRU on relations with pending fsync
requests. The idea would be to sync relations that haven't been touched
in a while in advance of the checkpoint even. I think that's similar to
the general idea Robert is suggesting here, to get some sync calls
flowing before all of the checkpoint writes have happened. I think that
the final sync calls will need to get spread out regardless, and since
doing that requires a fairly small amount of code too that's why we
started with that.

For a similar problem we had (kernel buffering too much) we had success
using the fadvise and madvise WONTNEED syscalls to force the data to
exit the cache much sooner than it would otherwise. This was on Linux
and it had the side-effect that the data was deleted from the kernel
cache, which we wanted, but probably isn't appropriate here.

There is also sync_file_range, but that's linux specific, although
close to what you want I think. It would allow you to work with blocks
smaller than 1GB.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patriotism is when love of your own people comes first; nationalism,
when hate for people other than your own comes first.
- Charles de Gaulle

#9Andres Freund
andres@anarazel.de
In reply to: Martijn van Oosterhout (#8)
Re: Spread checkpoint sync

On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote:

For a similar problem we had (kernel buffering too much) we had success
using the fadvise and madvise WONTNEED syscalls to force the data to
exit the cache much sooner than it would otherwise. This was on Linux
and it had the side-effect that the data was deleted from the kernel
cache, which we wanted, but probably isn't appropriate here.

Yep, works fine. Although it has the issue that the data will get read again if
archiving/SR is enabled.

There is also sync_file_range, but that's linux specific, although
close to what you want I think. It would allow you to work with blocks
smaller than 1GB.

Unfortunately that puts the data under quite high write-out pressure inside
the kernel - which is not what you actually want because it limits reordering
and such significantly.

It would be nicer if you could get a mix of both semantics (looking at it,
depending on the approach that seems to be about a 10 line patch to the
kernel). I.e. indicate that you want to write the pages soonish, but don't put
it on the head of the writeout queue.

Andres

#10Josh Berkus
josh@agliodbs.com
In reply to: Jeff Janes (#5)
Re: Spread checkpoint sync

On 11/20/10 6:11 PM, Jeff Janes wrote:

True, but I think that changing these from their defaults is not
considered to be a dark art reserved for kernel hackers, i.e they are
something that sysadmins are expected to tweak to suite their work
load, just like the shmmax and such.

I disagree. Linux kernel hackers know about these kinds of parameters,
and I suppose that Linux performance experts do. But very few
sysadmins, in my experience, have any idea.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#11Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#7)
Re: Spread checkpoint sync

On Sun, Nov 21, 2010 at 4:54 PM, Greg Smith <greg@2ndquadrant.com> wrote:

Let me throw some numbers out [...]

Interesting.

Ultimately what I want to do here is some sort of smarter write-behind sync
operation, perhaps with a LRU on relations with pending fsync requests.  The
idea would be to sync relations that haven't been touched in a while in
advance of the checkpoint even.  I think that's similar to the general idea
Robert is suggesting here, to get some sync calls flowing before all of the
checkpoint writes have happened.  I think that the final sync calls will
need to get spread out regardless, and since doing that requires a fairly
small amount of code too that's why we started with that.

Doing some kind of background fsyinc-ing might indeed be sensible, but
I agree that's secondary to trying to spread out the fsyncs during the
checkpoint itself. I guess the question is what we can do there
sensibly without an unreasonable amount of new code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Andres Freund (#9)
Re: Spread checkpoint sync

2010/11/21 Andres Freund <andres@anarazel.de>:

On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote:

For a similar problem we had (kernel buffering too much) we had success
using the fadvise and madvise WONTNEED syscalls to force the data to
exit the cache much sooner than it would otherwise. This was on Linux
and it had the side-effect that the data was deleted from the kernel
cache, which we wanted, but probably isn't appropriate here.

Yep, works fine. Although it has the issue that the data will get read again if
archiving/SR is enabled.

mmhh . the current code does call DONTNEED or WILLNEED for WAL
depending of the archiving off or on.

This matters *only* once the data is writen (fsync, fdatasync), before
that it should not have an effect.

There is also sync_file_range, but that's linux specific, although
close to what you want I think. It would allow you to work with blocks
smaller than 1GB.

Unfortunately that puts the data under quite high write-out pressure inside
the kernel - which is not what you actually want because it limits reordering
and such significantly.

It would be nicer if you could get a mix of both semantics (looking at it,
depending on the approach that seems to be about a 10 line patch to the
kernel). I.e. indicate that you want to write the pages soonish, but don't put
it on the head of the writeout queue.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#13Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Josh Berkus (#10)
Re: Spread checkpoint sync

Josh Berkus wrote:

On 11/20/10 6:11 PM, Jeff Janes wrote:

True, but I think that changing these from their defaults is not
considered to be a dark art reserved for kernel hackers, i.e they are
something that sysadmins are expected to tweak to suite their work
load, just like the shmmax and such.

I disagree. Linux kernel hackers know about these kinds of parameters,
and I suppose that Linux performance experts do. But very few
sysadmins, in my experience, have any idea.

To me, a lot of this conversation feels parallel to the
arguments the occasionally come up debating writing directly
to raw disks bypassing the filesystems altogether.

Might smoother checkpoints be better solved by talking
to the OS vendors & virtual-memory-tunning-knob-authors
to work with them on exposing the ideal knobs; rather than
saying that our only tool is a hammer(fsync) so the problem
must be handled as a nail.

Hypothetically - what would the ideal knobs be?

Something like madvise WONTNEED but that leaves pages
in the OS's cache after writing them?

#14Greg Smith
gsmith@gregsmith.com
In reply to: Ron Mayer (#13)
Re: Spread checkpoint sync

Ron Mayer wrote:

Might smoother checkpoints be better solved by talking
to the OS vendors & virtual-memory-tunning-knob-authors
to work with them on exposing the ideal knobs; rather than
saying that our only tool is a hammer(fsync) so the problem
must be handled as a nail.

Maybe, but it's hard to argue that the current implementation--just
doing all of the sync calls as fast as possible, one after the other--is
going to produce worst-case behavior in a lot of situations. Given that
it's not a huge amount of code to do better, I'd rather do some work in
that direction, instead of presuming the kernel authors will ever make
this go away. Spreading the writes out as part of the checkpoint rework
in 8.3 worked better than any kernel changes I've tested since then, and
I'm not real optimisic about this getting resolved at the system level.
So long as the database changes aren't antagonistic toward kernel
improvements, I'd prefer to have some options here that become effective
as soon as the database code is done.

I've attached an updated version of the initial sync spreading patch
here, one that applies cleanly on top of HEAD and over top of the sync
instrumentation patch too. The conflict that made that hard before is
gone now.

Having the pg_stat_bgwriter.buffers_backend_fsync patch available all
the time now has made me reconsider how important one potential bit of
refactoring here would be. I managed to catch one of the situations
where really popular relations were being heavily updated in a way that
was competing with the checkpoint on my test system (which I can happily
share the logs of), with the instrumentation patch applied but not the
spread sync one:

LOG: checkpoint starting: xlog
DEBUG: could not forward fsync request because request queue is full
CONTEXT: writing block 7747 of relation base/16424/16442
DEBUG: could not forward fsync request because request queue is full
CONTEXT: writing block 42688 of relation base/16424/16437
DEBUG: could not forward fsync request because request queue is full
CONTEXT: writing block 9723 of relation base/16424/16442
DEBUG: could not forward fsync request because request queue is full
CONTEXT: writing block 58117 of relation base/16424/16437
DEBUG: could not forward fsync request because request queue is full
CONTEXT: writing block 165128 of relation base/16424/16437
[330 of these total, all referring to the same two relations]

DEBUG: checkpoint sync: number=1 file=base/16424/16448_fsm
time=10132.830000 msec
DEBUG: checkpoint sync: number=2 file=base/16424/11645 time=0.001000 msec
DEBUG: checkpoint sync: number=3 file=base/16424/16437 time=7.796000 msec
DEBUG: checkpoint sync: number=4 file=base/16424/16448 time=4.679000 msec
DEBUG: checkpoint sync: number=5 file=base/16424/11607 time=0.001000 msec
DEBUG: checkpoint sync: number=6 file=base/16424/16437.1 time=3.101000 msec
DEBUG: checkpoint sync: number=7 file=base/16424/16442 time=4.172000 msec
DEBUG: checkpoint sync: number=8 file=base/16424/16428_vm time=0.001000
msec
DEBUG: checkpoint sync: number=9 file=base/16424/16437_fsm
time=0.001000 msec
DEBUG: checkpoint sync: number=10 file=base/16424/16428 time=0.001000 msec
DEBUG: checkpoint sync: number=11 file=base/16424/16425 time=0.000000 msec
DEBUG: checkpoint sync: number=12 file=base/16424/16437_vm
time=0.001000 msec
DEBUG: checkpoint sync: number=13 file=base/16424/16425_vm
time=0.001000 msec
LOG: checkpoint complete: wrote 3032 buffers (74.0%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=1.742 s, sync=10.153 s,
total=37.654 s; sync files=13, longest=10.132 s, average=0.779 s

Note here how the checkpoint was hung on trying to get 16448_fsm written
out, but the backends were issuing constant competing fsync calls to
these other relations. This is very similar to the production case this
patch was written to address, which I hadn't been able to share a good
example of yet. That's essentially what it looks like, except with the
contention going on for minutes instead of seconds.

One of the ideas Simon and I had been considering at one point was
adding some better de-duplication logic to the fsync absorb code, which
I'm reminded by the pattern here might be helpful independently of other
improvements.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Attachments:

sync-spread-v3.patchtext/x-patch; name=sync-spread-v3.patchDownload+93-35
#15Josh Berkus
josh@agliodbs.com
In reply to: Greg Smith (#14)
Re: Spread checkpoint sync

Maybe, but it's hard to argue that the current implementation--just
doing all of the sync calls as fast as possible, one after the other--is
going to produce worst-case behavior in a lot of situations. Given that
it's not a huge amount of code to do better, I'd rather do some work in
that direction, instead of presuming the kernel authors will ever make
this go away. Spreading the writes out as part of the checkpoint rework
in 8.3 worked better than any kernel changes I've tested since then, and
I'm not real optimisic about this getting resolved at the system level.
So long as the database changes aren't antagonistic toward kernel
improvements, I'd prefer to have some options here that become effective
as soon as the database code is done.

Besides, even if kernel/FS authors did improve things, the improvements
would not be available on production platforms for years. And, for that
matter, while Linux and BSD are pretty responsive to our feedback,
Apple, Microsoft and Oracle are most definitely not.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#16Jeff Janes
jeff.janes@gmail.com
In reply to: Greg Smith (#1)
Re: Spread checkpoint sync

On Sun, Nov 14, 2010 at 3:48 PM, Greg Smith <greg@2ndquadrant.com> wrote:

...

One change that turned out be necessary rather than optional--to get good
performance from the system under tuning--was to make regular background
writer activity, including fsync absorb checks, happen during these sync
pauses.  The existing code ran the checkpoint sync work in a pretty tight
loop, which as I alluded to in an earlier patch today can lead to the
backends competing with the background writer to get their sync calls
executed.  This squashes that problem if the background writer is setup
properly.

Have you tested out this "absorb during syncing phase" code without
the sleep between the syncs?
I.e. so that it still a tight loop, but the loop alternates between
sync and absorb, with no intentional pause?

I wonder if all the improvement you see might not be due entirely to
the absorb between syncs, and none or very little from
the sleep itself.

I ask because I don't have a mental model of how the pause can help.
Given that this dirty data has been hanging around for many minutes
already, what is a 3 second pause going to heal?

The healing power of clearing out the absorb queue seems much more obvious.

Cheers,

Jeff

#17Greg Smith
gsmith@gregsmith.com
In reply to: Jeff Janes (#16)
Re: Spread checkpoint sync

Jeff Janes wrote:

Have you tested out this "absorb during syncing phase" code without
the sleep between the syncs?
I.e. so that it still a tight loop, but the loop alternates between
sync and absorb, with no intentional pause?

Yes; that's how it was developed. It helped to have just the extra
absorb work without the pauses, but that alone wasn't enough to really
improve things on the server we ran into this problem badly on.

I ask because I don't have a mental model of how the pause can help.
Given that this dirty data has been hanging around for many minutes
already, what is a 3 second pause going to heal?

The difference is that once an fsync call is made, dirty data is much
more likely to be forced out. It's the one thing that bypasses all
other ways the kernel might try to avoid writing the data--both the
dirty ratio guidelines and the congestion control logic--and forces
those writes to happen as soon as they can be scheduled. If you graph
the amount of data shown "Dirty:" by /proc/meminfo over time, once the
sync calls start happening it's like a descending staircase pattern,
dropping a little bit as each sync fires.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

#18Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Greg Smith (#17)
Re: Spread checkpoint sync

On 01.12.2010 06:25, Greg Smith wrote:

Jeff Janes wrote:

I ask because I don't have a mental model of how the pause can help.
Given that this dirty data has been hanging around for many minutes
already, what is a 3 second pause going to heal?

The difference is that once an fsync call is made, dirty data is much
more likely to be forced out. It's the one thing that bypasses all other
ways the kernel might try to avoid writing the data--both the dirty
ratio guidelines and the congestion control logic--and forces those
writes to happen as soon as they can be scheduled. If you graph the
amount of data shown "Dirty:" by /proc/meminfo over time, once the sync
calls start happening it's like a descending staircase pattern, dropping
a little bit as each sync fires.

Do you have any idea how to autotune the delay between fsyncs?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#19Greg Smith
gsmith@gregsmith.com
In reply to: Heikki Linnakangas (#18)
Re: Spread checkpoint sync

Heikki Linnakangas wrote:

Do you have any idea how to autotune the delay between fsyncs?

I'm thinking to start by counting the number of relations that need them
at the beginning of the checkpoint. Then use the same basic math that
drives the spread writes, where you assess whether you're on schedule or
not based on segment/time progress relative to how many have been sync'd
out of that total. At a high level I think that idea translates over
almost directly into the existing write spread code. Was hoping for a
sanity check from you in particular about whether that seems reasonable
or not before diving into the coding.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Greg Smith (#19)
Re: Spread checkpoint sync

On 01.12.2010 23:30, Greg Smith wrote:

Heikki Linnakangas wrote:

Do you have any idea how to autotune the delay between fsyncs?

I'm thinking to start by counting the number of relations that need them
at the beginning of the checkpoint. Then use the same basic math that
drives the spread writes, where you assess whether you're on schedule or
not based on segment/time progress relative to how many have been sync'd
out of that total. At a high level I think that idea translates over
almost directly into the existing write spread code. Was hoping for a
sanity check from you in particular about whether that seems reasonable
or not before diving into the coding.

Sounds reasonable to me. fsync()s are a lot less uniform than write()s,
though. If you fsync() a file with one dirty page in it, it's going to
return very quickly, but a 1GB file will take a while. That could be
problematic if you have a thousand small files and a couple of big ones,
as you would want to reserve more time for the big ones. I'm not sure
what to do about it, maybe it's not a problem in practice.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#21Bruce Momjian
bruce@momjian.us
In reply to: Greg Smith (#17)
#22Josh Berkus
josh@agliodbs.com
In reply to: Bruce Momjian (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#21)
#24Greg Smith
gsmith@gregsmith.com
In reply to: Bruce Momjian (#21)
#25Greg Smith
gsmith@gregsmith.com
In reply to: Heikki Linnakangas (#20)
#26Rob Wultsch
wultsch@gmail.com
In reply to: Greg Smith (#25)
#27Greg Smith
gsmith@gregsmith.com
In reply to: Rob Wultsch (#26)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Greg Smith (#27)
#29Greg Smith
gsmith@gregsmith.com
In reply to: Alvaro Herrera (#28)
#30Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#28)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#14)
#32Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#31)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#32)
#34Simon Riggs
simon@2ndQuadrant.com
In reply to: Greg Smith (#32)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#34)
#36Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#33)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#36)
#38Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#37)
#39Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#35)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#38)
#41Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#40)
#42Marti Raudsepp
marti@juffo.org
In reply to: Robert Haas (#33)
#43Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#41)
#44Simone Aiken
saiken@ulfheim.net
In reply to: Marti Raudsepp (#42)
#45Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#43)
#46Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Simone Aiken (#44)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nicolas Barbier (#46)
#48Simone Aiken
saiken@ulfheim.net
In reply to: Tom Lane (#47)
#49Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#31)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#49)
#51Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#50)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#51)
#53Bruce Momjian
bruce@momjian.us
In reply to: Greg Smith (#45)
#54Jeff Janes
jeff.janes@gmail.com
In reply to: Greg Smith (#51)
#55Greg Smith
gsmith@gregsmith.com
In reply to: Jeff Janes (#54)
#56Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#35)
#57Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#56)
#58Greg Smith
gsmith@gregsmith.com
In reply to: Jim Nasby (#56)
#59Greg Smith
gsmith@gregsmith.com
In reply to: Bruce Momjian (#53)
#60Simone Aiken
saiken@ulfheim.net
In reply to: Simone Aiken (#44)
#61Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Greg Smith (#59)
#62Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#33)
#63Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Simone Aiken (#44)
#64Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Simone Aiken (#44)
#65Simone Aiken
saiken@ulfheim.net
In reply to: Alvaro Herrera (#63)
#66Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#63)
#67Simone Aiken
saiken@ulfheim.net
In reply to: Robert Haas (#66)
#68Josh Berkus
josh@agliodbs.com
In reply to: Greg Smith (#55)
#69Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#66)
#70Robert Haas
robertmhaas@gmail.com
In reply to: Simone Aiken (#67)
#71Simone Aiken
saiken@quietlyCompetent.com
In reply to: Robert Haas (#70)
#72Simone Aiken
saiken@quietlyCompetent.com
In reply to: Bruce Momjian (#69)
#73Robert Haas
robertmhaas@gmail.com
In reply to: Simone Aiken (#71)
#74Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#73)
#75Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#73)
#76Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#75)
#77Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#76)
#78Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#77)
#79Simone Aiken
saiken@ulfheim.net
In reply to: Tom Lane (#75)
#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#78)
#81Robert Haas
robertmhaas@gmail.com
In reply to: Simone Aiken (#79)
#82Simone Aiken
saiken@quietlyCompetent.com
In reply to: Robert Haas (#78)
#83Robert Haas
robertmhaas@gmail.com
In reply to: Simone Aiken (#82)
#84Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#83)
#85Greg Smith
gsmith@gregsmith.com
In reply to: Greg Smith (#51)
#86Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#85)
#87Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#86)
#88Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#86)
#89Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#88)
#90Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#14)
#91Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Robert Haas (#90)
#92Robert Haas
robertmhaas@gmail.com
In reply to: Itagaki Takahiro (#91)
#93Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#92)
#94Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#93)
#95Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#94)
#96Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#95)
#97Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#96)
#98Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#90)
#99Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#98)
#100Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#97)
#101Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#100)
#102Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#101)
#103Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#96)
#104Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#101)
#105Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#98)
#106Greg Smith
gsmith@gregsmith.com
In reply to: Greg Smith (#105)
#107Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#104)
#108Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#107)
#109Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#107)
#110Bruce Momjian
bruce@momjian.us
In reply to: Greg Smith (#106)
#111Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#108)
#112Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#108)
#113Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#109)
#114Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#113)
#115Michael Banck
michael.banck@credativ.de
In reply to: Greg Smith (#32)
#116Greg Smith
gsmith@gregsmith.com
In reply to: Michael Banck (#115)
#117Greg Smith
gsmith@gregsmith.com
In reply to: Greg Smith (#106)
#118Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#117)
#119Greg Smith
gsmith@gregsmith.com
In reply to: Robert Haas (#90)
#120Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Greg Smith (#119)
#121Greg Smith
gsmith@gregsmith.com
In reply to: Cédric Villemain (#120)
#122Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Greg Smith (#121)
#123Greg Smith
gsmith@gregsmith.com
In reply to: Kevin Grittner (#122)
#124Greg Smith
gsmith@gregsmith.com
In reply to: Greg Smith (#119)
#125Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#124)