CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!
Kevin didn't send out an official gavel-banging announcement of the
end of CommitFest 2009-07 (possibly because I neglected until today to
give him privileges to actually change it in the web application), but
I'd just like to take a minute to thank him publicly for his efforts.
We started this CommitFest with something like 60 patches, which is
definitely on the larger side for a CommitFest, and Kevin did a great
job staying on top of what was going on with all of them and, I felt,
really helped keep us on track. At the same time, I felt he did this
with a very light touch that made the whole thing go very smoothly.
So -- thanks, Kevin!
I also appreciate the efforts of all those who reviewed. Good reviews
are really critical to keep the burden from building up on committers,
and I appreciate the efforts of everyone who contributed, in many
cases probably on their own time. I'm particularly grateful to the
people who were vigilant about spelling, grammar, coding style,
whitespace, and other nitpicky little issues that are not much fun,
but which at least for me are a major time sink if they're still
lingering when it comes time to do the actual commit.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> wrote:
I'd just like to take a minute to thank him publicly for his
efforts. We started this CommitFest with something like 60
patches, which is definitely on the larger side for a CommitFest,
and Kevin did a great job staying on top of what was going on with
all of them and, I felt, really helped keep us on track. At the
same time, I felt he did this with a very light touch that made
the whole thing go very smoothly. So -- thanks, Kevin!
You're welcome. It was educational for me. I don't think I want to
try to handle two in a row, and I think your style is better suited
than mine to the final CF for a release, but I might be able to take
on the 2010-11 CF if people want that.
My hand was not always so light behind the scenes, though -- I sent
or received about 100 off-list emails to try to keep things moving.
Hopefully nobody was too offended by my nagging. :-)
Oh, and thanks for putting together the CF web application. Without
that, I couldn't have done half as well as I did.
I also appreciate the efforts of all those who reviewed.
Yes, I'll second that! I've always been impressed with the
PostgreSQL community, and managing this CF gave me new insights and
appreciation for the intelligence, professionalism, and community
spirit of its members -- authors, reviewers, and committers.
-Kevin
Kevin Grittner wrote:
I don't think I want to try to handle two in a row, and I think your style is better suited
than mine to the final CF for a release, but I might be able to take on the 2010-11 CF if people want that
Ha, you just put yourself right back on the hook with that comment, and
Robert does seem like the right guy for CF-4 @ 2011-01. Leaving the
question of what's going to happen with CF-2 next month.
I think the crucial thing with the 2010-09 CF is that we have to get
serious progress made sorting out all the sync rep ideas before/during
that one. The review Yeb did and subsequent discussion was really
helpful, but the scope on that needs to actually get nailed down to
*something* concrete if it's going to get built early enough in the 9.1
release to be properly reviewed and tested for more than one round.
Parts of the design and scope still feel like they're expanding to me,
and I think having someone heavily involved in the next CF who is
willing to push on nailing down that particular area is pretty
important. Will volunteer myself if I can stay on schedule to make it
past the major time commitment sink I've had so far this year by then.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
On Wed, Aug 18, 2010 at 7:46 PM, Greg Smith <greg@2ndquadrant.com> wrote:
Kevin Grittner wrote:
I don't think I want to try to handle two in a row, and I think your style
is better suited
than mine to the final CF for a release, but I might be able to take on
the 2010-11 CF if people want thatHa, you just put yourself right back on the hook with that comment, and
Robert does seem like the right guy for CF-4 @ 2011-01. Leaving the
question of what's going to happen with CF-2 next month.
My reputation precedes me, apparently. Although I appreciate everyone
so far being willing to avoid mentioning exactly what that reputation
might be. :-)
I think the crucial thing with the 2010-09 CF is that we have to get serious
progress made sorting out all the sync rep ideas before/during that one.
The review Yeb did and subsequent discussion was really helpful, but the
scope on that needs to actually get nailed down to *something* concrete if
it's going to get built early enough in the 9.1 release to be properly
reviewed and tested for more than one round. Parts of the design and scope
still feel like they're expanding to me, and I think having someone heavily
involved in the next CF who is willing to push on nailing down that
particular area is pretty important. Will volunteer myself if I can stay on
schedule to make it past the major time commitment sink I've had so far this
year by then.
Sitting on Sync Rep is a job and a half by itself, without adding all
the other CF work on top of it. Maybe we should try to find two
vi^Holunteers: a CommitFest Manager (CFM) and a Major Feature
Babysitter (MBS). At any rate, we should definitely NOT wait another
month to start thinking about Sync Rep again. I haven't actually
looked at any of the Sync Rep code AT ALL but IIRC Heikki expressed
the view that the biggest thing standing in the way of a halfway
decent Sync Rep implementation was a number of polling loops that
needed to be replaced with something that wouldn't introduce
up-to-100ms delays. And so far we haven't seen a patch for that.
Somebody write one. And then let's get it reviewed and committed RSN.
It may seem like we're early in the release cycle yet, but for a
feature of this magnitude we are not. We committed way too much big
stuff at the very end of the last release cycle; Hot Standby was still
being cleaned up in May after commit in November. We'll be lucky to
commit sync rep that early.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On 19/08/10 04:46, Robert Haas wrote:
At any rate, we should definitely NOT wait another
month to start thinking about Sync Rep again.
Agreed. EnterpriseDB is interested in having that feature, so I'm on the
hook to spend time on it regardless of commitfests.
I haven't actually
looked at any of the Sync Rep code AT ALL but IIRC Heikki expressed
the view that the biggest thing standing in the way of a halfway
decent Sync Rep implementation was a number of polling loops that
needed to be replaced with something that wouldn't introduce
up-to-100ms delays.
Well, that's the only uncontroversial thing about it that doesn't
require any fighting over the UI or desired behavior. That's why I've
focused on that first, and also because it's useful regardless of
synchronous replication. But once that's done, we'll have to nail down
how synchronous replication is supposed to behave, and how to configure it.
And so far we haven't seen a patch for that.
Somebody write one. And then let's get it reviewed and committed RSN.
Fujii is on vacation, but I've started working on it. The two issues
with Fujii's latest patch are that it would not respond promptly on
platforms where signals don't interrupt sleep, and it suffers the
classic race condition that pselect() was invented for. I'm going to
replace pg_usleep() with select(), and use the so called "self-pipe
trick" to get over the race condition. I have that written up but I want
to do some testing and cleanup before posting the patch.
It may seem like we're early in the release cycle yet, but for a
feature of this magnitude we are not. We committed way too much big
stuff at the very end of the last release cycle; Hot Standby was still
being cleaned up in May after commit in November. We'll be lucky to
commit sync rep that early.
Agreed. We need to decide the scope and minimum set of features real
soon to get something concrete finished.
BTW, on what platforms signals don't interrupt sleep? Although that
issue has been discussed many times before, I couldn't find any
reference to a real platform in the archives.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
BTW, on what platforms signals don't interrupt sleep? Although that
issue has been discussed many times before, I couldn't find any
reference to a real platform in the archives.
I've got one in captivity (my old HPUX box). Happy to test whatever you
come up with.
Considering that pg_usleep is implemented with select, I'm not following
what you mean by "replace pg_usleep() with select()"?
regards, tom lane
Excerpts from Heikki Linnakangas's message of jue ago 19 02:02:34 -0400 2010:
On 19/08/10 04:46, Robert Haas wrote:
And so far we haven't seen a patch for that.
Somebody write one. And then let's get it reviewed and committed RSN.Fujii is on vacation, but I've started working on it. The two issues
with Fujii's latest patch are that it would not respond promptly on
platforms where signals don't interrupt sleep, and it suffers the
classic race condition that pselect() was invented for. I'm going to
replace pg_usleep() with select(), and use the so called "self-pipe
trick" to get over the race condition. I have that written up but I want
to do some testing and cleanup before posting the patch.
Hmm, IIRC the self-pipe trick doesn't work on Windows, mainly because
select() doesn't handle pipes, only sockets. You may need some extra
hack to make it work there.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Aug 19, 2010 at 17:08, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Excerpts from Heikki Linnakangas's message of jue ago 19 02:02:34 -0400 2010:
On 19/08/10 04:46, Robert Haas wrote:
And so far we haven't seen a patch for that.
Somebody write one. And then let's get it reviewed and committed RSN.Fujii is on vacation, but I've started working on it. The two issues
with Fujii's latest patch are that it would not respond promptly on
platforms where signals don't interrupt sleep, and it suffers the
classic race condition that pselect() was invented for. I'm going to
replace pg_usleep() with select(), and use the so called "self-pipe
trick" to get over the race condition. I have that written up but I want
to do some testing and cleanup before posting the patch.Hmm, IIRC the self-pipe trick doesn't work on Windows, mainly because
select() doesn't handle pipes, only sockets. You may need some extra
hack to make it work there.
We have a pipe implementation using sockets that is used on Windows
for just this reason, IIRC.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 19/08/10 18:08, Alvaro Herrera wrote:
Excerpts from Heikki Linnakangas's message of jue ago 19 02:02:34 -0400 2010:
On 19/08/10 04:46, Robert Haas wrote:
And so far we haven't seen a patch for that.
Somebody write one. And then let's get it reviewed and committed RSN.Fujii is on vacation, but I've started working on it. The two issues
with Fujii's latest patch are that it would not respond promptly on
platforms where signals don't interrupt sleep, and it suffers the
classic race condition that pselect() was invented for. I'm going to
replace pg_usleep() with select(), and use the so called "self-pipe
trick" to get over the race condition. I have that written up but I want
to do some testing and cleanup before posting the patch.Hmm, IIRC the self-pipe trick doesn't work on Windows, mainly because
select() doesn't handle pipes, only sockets. You may need some extra
hack to make it work there.
That's fine, we can do the naive set-a-flag implementation on Windows
because on Windows our "signals" are only delivered at
CHECK_FOR_INTERRUPTS(), so we don't have the race condition to begin with.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On 19/08/10 16:38, Tom Lane wrote:
Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
BTW, on what platforms signals don't interrupt sleep? Although that
issue has been discussed many times before, I couldn't find any
reference to a real platform in the archives.I've got one in captivity (my old HPUX box). Happy to test whatever you
come up with.Considering that pg_usleep is implemented with select, I'm not following
what you mean by "replace pg_usleep() with select()"?
Instead of using pg_usleep(), call select() directly, waiting not only
for the timeout, but also for data to arrive on the "self-pipe". The
signal handler writes a byte to the self-pipe, waking up the select().
That way the select() is interupted by the signal arriving, even if
signals per se don't interrupt it. And it closes the race condition
involved with setting a flag in the signal handler and checking that in
the main loop.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
On 19/08/10 16:38, Tom Lane wrote:
Considering that pg_usleep is implemented with select, I'm not following
what you mean by "replace pg_usleep() with select()"?
Instead of using pg_usleep(), call select() directly, waiting not only
for the timeout, but also for data to arrive on the "self-pipe". The
signal handler writes a byte to the self-pipe, waking up the select().
That way the select() is interupted by the signal arriving, even if
signals per se don't interrupt it. And it closes the race condition
involved with setting a flag in the signal handler and checking that in
the main loop.
Hmm, but couldn't you still do that inside pg_usleep? Signal handlers
that do that couldn't know if they were interrupting a sleep per se,
so this would have to be a backend-wide convention.
regards, tom lane
On 19/08/10 19:57, Tom Lane wrote:
Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
On 19/08/10 16:38, Tom Lane wrote:
Considering that pg_usleep is implemented with select, I'm not following
what you mean by "replace pg_usleep() with select()"?Instead of using pg_usleep(), call select() directly, waiting not only
for the timeout, but also for data to arrive on the "self-pipe". The
signal handler writes a byte to the self-pipe, waking up the select().
That way the select() is interupted by the signal arriving, even if
signals per se don't interrupt it. And it closes the race condition
involved with setting a flag in the signal handler and checking that in
the main loop.Hmm, but couldn't you still do that inside pg_usleep? Signal handlers
that do that couldn't know if they were interrupting a sleep per se,
so this would have to be a backend-wide convention.
I don't understand, do what inside pg_usleep()?
We only need to respond quickly to one signal, the one that tells
walsender "there's some new WAL that you should send". We can rely on
polling for all the other signals like SIGHUP for config reload or
shutdown request, like we do today.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
On 19/08/10 19:57, Tom Lane wrote:
Hmm, but couldn't you still do that inside pg_usleep? Signal handlers
that do that couldn't know if they were interrupting a sleep per se,
so this would have to be a backend-wide convention.
I don't understand, do what inside pg_usleep()?
I was envisioning still using pg_usleep, and having this interaction
between signal handlers and the delay be private to pg_usleep, rather
than putting such ugly code into forty-nine different places. There
are a *lot* of places where we have loops that break down delays
into at-most-one-second pg_usleep calls, and if we're going to have a
hack like this we should fix them all, not just two or three that SR
cares about.
But I'm still not seeing how this self-pipe hack avoids a race
condition. If the signal handler is sending a byte whenever it
executes, then you could have bytes already stacked up in the pipe
from previous interrupts that didn't happen to come while inside
pg_usleep. If you clear those before sleeping, you have a race
condition, and if you don't, then you fail to sleep the intended
amount of time even though there was no interrupt this time.
regards, tom lane
On 19/08/10 20:18, Tom Lane wrote:
Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
On 19/08/10 19:57, Tom Lane wrote:
Hmm, but couldn't you still do that inside pg_usleep? Signal handlers
that do that couldn't know if they were interrupting a sleep per se,
so this would have to be a backend-wide convention.I don't understand, do what inside pg_usleep()?
I was envisioning still using pg_usleep, and having this interaction
between signal handlers and the delay be private to pg_usleep, rather
than putting such ugly code into forty-nine different places. There
are a *lot* of places where we have loops that break down delays
into at-most-one-second pg_usleep calls, and if we're going to have a
hack like this we should fix them all, not just two or three that SR
cares about.
Hmm, will need to think about a suitable API for that. The nice thing
would be that we could implement it using pselect() where available.
(And reliable - the Linux select() man page says that glibc's pselect()
is emulated using select(), and suffers from the very same race
condition pselect() was invented to solve. How awful is that!?)
But I'm still not seeing how this self-pipe hack avoids a race
condition. If the signal handler is sending a byte whenever it
executes, then you could have bytes already stacked up in the pipe
from previous interrupts that didn't happen to come while inside
pg_usleep. If you clear those before sleeping, you have a race
condition, and if you don't, then you fail to sleep the intended
amount of time even though there was no interrupt this time.
You clear the pipe after waking up. Before sending all the pending WAL
and deciding that you're fully caught up again:
for(;;)
{
1. select()
2. clear pipe
3. send any pending WAL
}
There's more information on the self-pipe trick at e.g
http://lwn.net/Articles/177897/
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
On 19/08/10 20:18, Tom Lane wrote:
But I'm still not seeing how this self-pipe hack avoids a race
condition. If the signal handler is sending a byte whenever it
executes, then you could have bytes already stacked up in the pipe
from previous interrupts that didn't happen to come while inside
pg_usleep. If you clear those before sleeping, you have a race
condition, and if you don't, then you fail to sleep the intended
amount of time even though there was no interrupt this time.
You clear the pipe after waking up.
Hmm ... that doesn't answer my second objection about failing to sleep
the expected amount of time, but on reflection I guess that can't be
done: we have to be able to cope with interrupts occurring just before
the sleep actually begins, and there's no way to define "just before"
except by reference to the calling code having done whatever it might
need to do before/after sleeping.
Hmm, will need to think about a suitable API for that.
Offhand I'd suggest something like
SetSleepInterrupt() -- called by signal handlers, writes pipe
ClearSleepInterrupt() -- called by sleep-and-do-something loops, clears pipe
pg_usleep() itself remains the same, but it is now guaranteed to return
immediately if SetSleepInterrupt is called, or has been called since the
last ClearSleepInterrupt.
The nice thing
would be that we could implement it using pselect() where available.
(And reliable - the Linux select() man page says that glibc's pselect()
is emulated using select(), and suffers from the very same race
condition pselect() was invented to solve. How awful is that!?)
Ick. So how would you tell if it's trustworthy?
regards, tom lane
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Thu, Aug 19, 2010 at 08:36:09PM +0300, Heikki Linnakangas wrote:
[...]
Hmm, will need to think about a suitable API for that. The nice thing would
be that we could implement it using pselect() where available. (And
reliable - the Linux select() man page says that glibc's pselect() is
emulated using select(), and suffers from the very same race condition
pselect() was invented to solve. How awful is that!?)
It is indeed. It seems, though, that from Linux kernel 2.6.16 and glibc
2.4 on, things look better [1]<http://lwn.net/Articles/176911/>. As a reference, Debian stable (not known
to adventure too far into the present ;-) is libc 2.7 on kernel 2.6.26.
Of course, "enterprise" GNU/Linux distros are said to be even more
conservative...
[1]: <http://lwn.net/Articles/176911/>
Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFMbiauBcgs9XrR2kYRAhiwAJ41f29jSIy409epTH0eJRXW17oByACeIkRo
CRg2BCw8tn3PkdnNR1i/MCY=
=GVMT
-----END PGP SIGNATURE-----
On 19/08/10 20:59, Tom Lane wrote:
Offhand I'd suggest something like
SetSleepInterrupt() -- called by signal handlers, writes pipe
ClearSleepInterrupt() -- called by sleep-and-do-something loops, clears pipepg_usleep() itself remains the same, but it is now guaranteed to return
immediately if SetSleepInterrupt is called, or has been called since the
last ClearSleepInterrupt.
Hmm, we have pg_usleep() calls in some fairly low-level functions, like
mdunlink() and s_lock(). If someone has called SetSleepInterrupt(), we
don't want those pg_usleep()s to return immediately. And pg_usleep() is
used in some client code too. I think we need a separate sleep function
for this.
Another idea is to not use unix signals at all, but ProcSendSignal() and
ProcWaitForSignal(). We would not need the signal handler at all.
Walsender would use ProcWaitForSignal() instead of pg_usleep() and
backends that want to wake it up would use ProcSendSignal(). The problem
is that there is currently no way to specify a timeout, but I presume
the underlying semaphore operations have that capability, and we could
expose it.
Actually ProcSendSignal()/ProcWaitForSignal() won't work as is, because
walsender doesn't have a PGPROC entry, but you could easily build a
similar mechanism, using PGSemaphoreLock/Unlock like
ProcSendSignal()/WaitForSignal() does.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
Hmm, we have pg_usleep() calls in some fairly low-level functions, like
mdunlink() and s_lock(). If someone has called SetSleepInterrupt(), we
don't want those pg_usleep()s to return immediately. And pg_usleep() is
used in some client code too. I think we need a separate sleep function
for this.
Well, we'd need some careful thought about which sleeps need what, but I
don't necessarily have an objection to a separate interruptable sleep
function.
Another idea is to not use unix signals at all, but ProcSendSignal() and
ProcWaitForSignal(). We would not need the signal handler at all.
Walsender would use ProcWaitForSignal() instead of pg_usleep() and
backends that want to wake it up would use ProcSendSignal().
You keep on proposing solutions that only work for walsender :-(.
Most of the other places where we have pg_usleep actually do want
a timed sleep, I believe. It's also unclear that we can always expect
ProcSendSignal to be usable --- for example, stuff like SIGHUP would
be sent by processes that might not be connected to shared memory.
The problem
is that there is currently no way to specify a timeout, but I presume
the underlying semaphore operations have that capability, and we could
expose it.
They don't, or at least the semop-based implementation doesn't.
regards, tom lane
On 20/08/10 16:24, Tom Lane wrote:
Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
Hmm, we have pg_usleep() calls in some fairly low-level functions, like
mdunlink() and s_lock(). If someone has called SetSleepInterrupt(), we
don't want those pg_usleep()s to return immediately. And pg_usleep() is
used in some client code too. I think we need a separate sleep function
for this.Well, we'd need some careful thought about which sleeps need what, but I
don't necessarily have an objection to a separate interruptable sleep
function.
If we have to, we could also support multiple interrupts with multiple
self-pipes, so that you can choose at pg_usleep() which ones to wake up on.
Another idea is to not use unix signals at all, but ProcSendSignal() and
ProcWaitForSignal(). We would not need the signal handler at all.
Walsender would use ProcWaitForSignal() instead of pg_usleep() and
backends that want to wake it up would use ProcSendSignal().You keep on proposing solutions that only work for walsender :-(.
Well yes, the other places where we use pg_usleep() are not really a
problem as is. If as a side-effect we can make them respond more quickly
to signals, with small changes, that's good, but walsender is the only
one that's performance critical.
That said, a select() based solution is my current favorite.
Most of the other places where we have pg_usleep actually do want
a timed sleep, I believe. It's also unclear that we can always expect
ProcSendSignal to be usable --- for example, stuff like SIGHUP would
be sent by processes that might not be connected to shared memory.The problem
is that there is currently no way to specify a timeout, but I presume
the underlying semaphore operations have that capability, and we could
expose it.They don't, or at least the semop-based implementation doesn't.
There's semtimedop(). I don't know how portable it is, but it seems to
exist at least on Linux, Solaris, HPUX and AIX. On what platforms do we
use sysv semaphores?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
[ It's way past time to change the thread title ]
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
On 20/08/10 16:24, Tom Lane wrote:
You keep on proposing solutions that only work for walsender :-(.
Well yes, the other places where we use pg_usleep() are not really a
problem as is.
Well, yes they are. They cause unnecessary process wakeups and thereby
consume cycles even when the database is idle. See for example a
longstanding complaint here:
https://bugzilla.redhat.com/show_bug.cgi?id=252129
If we're going to go to the trouble of having a mechanism like this,
I'd like it to fix that problem so I can close out that bug.
There's semtimedop(). I don't know how portable it is, but it seems to
exist at least on Linux, Solaris, HPUX and AIX.
It's not on my HPUX, and I don't see it in the Single Unix Spec.
On what platforms do we use sysv semaphores?
AFAIK, everything except Windows and extremely old versions of OS X.
regards, tom lane