condition variables

Started by Robert Haasover 9 years ago28 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

Hi,

Some of my EnterpriseDB colleagues and I have been working on various
parallel query projects, all of which have been previously disclosed
here:

https://wiki.postgresql.org/wiki/EnterpriseDB_database_server_roadmap

One issue we've encountered is that it's not very easy for one process
in a group of cooperating parallel processes to wait for another
process in that same group. One idea is to have one process grab an
LWLock and other processes try to acquire it, but that actually
doesn't work very well. A pretty obvious problem is that it holds of
interrupts for the entire time that you are holding the lock, which is
pretty undesirable. A more subtle problem is that it's easy to
conceive of situations where the LWLock paradigm is just a very poor
fit for what you actually want to do. For example, suppose you have a
computation which proceeds in two phases: each backend that finishes
phase 1 must wait until all backends finish phase 1, and once all have
finished, all can begin phase 2. You could handle this case by having
an LWLock which everyone holds during phase 1 in shared mode, and then
everyone must briefly acquire it in exclusive mode before starting
phase 2, but that's an awful hack. It also has race conditions: what
if someone finishes phase 1 before everyone has started phase 1? And
what if there are 10 phases instead of 2?

Another approach to the problem is to use a latch wait loop. That
almost works. Interrupts can be serviced, and you can recheck shared
memory to see whether the condition for proceeding is satisfied after
each iteration of the loop. There's only one problem: when you do
something that might cause the condition to be satisfied for other
waiting backends, you need to set their latch - but you don't have an
easy way to know exactly which processes are waiting, so how do you
call SetLatch? I originally thought of adding a function like
SetAllLatches(ParallelContext *) and maybe that can work, but then I
had what I think is a better idea, which is to introduce a notion of
condition variables. Condition variables, of course, are a standard
synchronization primitive:

https://en.wikipedia.org/wiki/Monitor_(synchronization)#Condition_variables_2

Basically, a condition variable has three operations: you can wait for
the condition variable; you can signal the condition variable to wake
up one waiter; or you can broadcast on the condition variable to wake
up all waiters. Atomically with entering the wait, you must be able
to check whether the condition is satisfied. So, in my
implementation, a condition variable wait loop looks like this:

for (;;)
{
ConditionVariablePrepareToSleep(cv);
if (condition for which we are waiting is satisfied)
break;
ConditionVariableSleep();
}
ConditionVariableCancelSleep();

To wake up one waiter, another backend can call
ConditionVariableSignal(cv); to wake up all waiters,
ConditionVariableBroadcast(cv).

I am cautiously optimistic that this design will serve a wide variety
of needs for parallel query development - basically anything that
needs to wait for another process to reach a certain point in the
computation that can be detected through changes in shared memory
state. The attached patch condition-variable-v1.patch implements this
API. I originally open-coded the wait queue for this, but I've just
finished rebasing it on top of Thomas Munro's proclist stuff, so
before applying this patch you need the one from here:

/messages/by-id/CAEepm=0Vvr9zgwHt67RwuTfwMEby1GiGptBk3xFPDbbgEtZgMg@mail.gmail.com

At some point while hacking on this I realized that we could actually
replace the io_in_progress locks with condition variables; the
attached patch buffer-io-cv-v1.patch does this (it must be applied on
top of the proclist patch from the above email and also on top of
condition-variable-v1.patch). Using condition variables here seems to
have a couple of advantages. First, it means that a backend waiting
for buffer I/O to complete is interruptible. Second, it fixes a
long-running bit of nastiness in AbortBufferIO: right now, if a
backend that is doing buffer I/O aborts, the abort causes it to
release all of its LWLocks, including the buffer I/O lock. Everyone
waiting for that buffer busy-loops until the aborting process gets
around to reacquiring the lock and updating the buffer state in
AbortBufferIO. But if we replace the io_in_progress locks with
condition variables, then that doesn't happen any more. Nobody is
"holding" the condition variable, so it doesn't get "released" when
the process doing I/O aborts. Instead, they just keep sleeping until
the aborting process reaches AbortBufferIO, and then it broadcasts on
the condition variable and wakes everybody up, which seems a good deal
nicer.

I'm very curious to know whether other people like this abstraction
and whether they think it will be useful for things they want to do
with parallel query (or otherwise). Comments welcome. Review
appreciated. Other suggestions for how to handle this are cool, too.

Credit: These patches were written by me; an earlier version of the
condition-variable-v1.patch was reviewed and tested by Rahila Syed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

condition-variable-v1.patchapplication/x-download; name=condition-variable-v1.patchDownload+257-2
buffer-io-cv-v1.patchapplication/x-download; name=buffer-io-cv-v1.patchDownload+38-65
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#1)
Re: condition variables

Robert Haas <robertmhaas@gmail.com> writes:

I had what I think is a better idea, which is to introduce a notion of
condition variables.

Interesting proposal.

... Using condition variables here seems to
have a couple of advantages. First, it means that a backend waiting
for buffer I/O to complete is interruptible. Second, it fixes a
long-running bit of nastiness in AbortBufferIO: right now, if a
backend that is doing buffer I/O aborts, the abort causes it to
release all of its LWLocks, including the buffer I/O lock. Everyone
waiting for that buffer busy-loops until the aborting process gets
around to reacquiring the lock and updating the buffer state in
AbortBufferIO. But if we replace the io_in_progress locks with
condition variables, then that doesn't happen any more. Nobody is
"holding" the condition variable, so it doesn't get "released" when
the process doing I/O aborts. Instead, they just keep sleeping until
the aborting process reaches AbortBufferIO, and then it broadcasts on
the condition variable and wakes everybody up, which seems a good deal
nicer.

Hmm. I fear the only reason you see an advantage there is that you don't
(yet) have any general-purpose mechanism for an aborting transaction to
satisfy its responsibilities vis-a-vis waiters on condition variables.
Instead, this wins specifically because you stuck some bespoke logic into
AbortBufferIO. OK ... but that sounds like we're going to end up with
every single condition variable that ever exists in the system needing to
be catered for separately and explicitly during transaction abort cleanup.
Which does not sound promising from a reliability standpoint. On the
other hand, I don't know what the equivalent rule to "release all LWLocks
during abort" might look like for condition variables, so I don't know
if it's even possible to avoid that.

I encourage you to pursue this, because indeed LWLocks aren't always
an ideal solution, but I think it requires some careful thought about
what transaction aborts will do with condition variables.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Robert Haas (#1)
Re: condition variables

On Thu, Aug 11, 2016 at 2:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Another approach to the problem is to use a latch wait loop. That
almost works. Interrupts can be serviced, and you can recheck shared
memory to see whether the condition for proceeding is satisfied after
each iteration of the loop. There's only one problem: when you do
something that might cause the condition to be satisfied for other
waiting backends, you need to set their latch - but you don't have an
easy way to know exactly which processes are waiting, so how do you
call SetLatch?

That's what I ended up doing with parallel CREATE INDEX. It worked,
but it would be nice to have a general purpose facility for waiting
for conditions to change.

https://en.wikipedia.org/wiki/Monitor_(synchronization)#Condition_variables_2

Basically, a condition variable has three operations: you can wait for
the condition variable; you can signal the condition variable to wake
up one waiter; or you can broadcast on the condition variable to wake
up all waiters. Atomically with entering the wait, you must be able
to check whether the condition is satisfied. So, in my
implementation, a condition variable wait loop looks like this:

for (;;)
{
ConditionVariablePrepareToSleep(cv);
if (condition for which we are waiting is satisfied)
break;
ConditionVariableSleep();
}
ConditionVariableCancelSleep();

To wake up one waiter, another backend can call
ConditionVariableSignal(cv); to wake up all waiters,
ConditionVariableBroadcast(cv).

This seems convenient.

I notice that you acquire a spinlock within the implementation of
condition variables. Is it worth any effort to consolidate the number
of spinlock acquisitions? In other words, maybe the most common idioms
should be baked into the ConditionVariable interface, which could save
callers from having to use their own mutex variable.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#1)
Re: condition variables

On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:

https://en.wikipedia.org/wiki/Monitor_(synchronization)#Condition_variables_2

Basically, a condition variable has three operations: you can wait for
the condition variable; you can signal the condition variable to wake
up one waiter; or you can broadcast on the condition variable to wake
up all waiters. Atomically with entering the wait, you must be able
to check whether the condition is satisfied. So, in my
implementation, a condition variable wait loop looks like this:

for (;;)
{
ConditionVariablePrepareToSleep(cv);
if (condition for which we are waiting is satisfied)
break;
ConditionVariableSleep();
}
ConditionVariableCancelSleep();

To wake up one waiter, another backend can call
ConditionVariableSignal(cv); to wake up all waiters,
ConditionVariableBroadcast(cv).

It is interesting to compare this interface with Wikipedia's
description, POSIX's pthread_cond_t and C++'s std::condition_variable.

In those interfaces, the wait operation takes a mutex which must
already be held by the caller. It unlocks the mutex and begins
waiting atomically. Then when it returns, the mutex is automatically
reacquired. This approach avoids race conditions as long as the
shared state change you are awaiting is protected by that mutex. If
you check that state before waiting and while still holding the lock,
you can be sure not to miss any change signals, and then when it
returns you can check the state again and be sure that no one can be
concurrently changing it.

In contrast, this proposal leaves it up to client code to get that
right, similarly to the way you need to do things in a certain order
when waiting for state changes with latches. You could say that it's
more error prone: I think there have been a few cases of incorrectly
coded latch/state-change wait loops in the past. On the other hand,
it places no requirements on the synchronisation mechanism the client
code uses for the related shared state. pthread_cond_wait requires
you to pass in a pointer to the related pthread_mutex_t, whereas with
this proposal client code is free to use atomic ops, lwlocks,
spinlocks or any other mutual exclusion mechanism to coordinate state
changes and deal with cache coherency.

Then there is the question of what happens when the backend that is
supposed to be doing the signalling dies or aborts, which Tom Lane
referred to in his reply. In those other libraries there is no such
concern: it's understood that these are low level thread
synchronisation primitives and if you're waiting for something that
never happens, you'll be waiting forever. I don't know what the
answer is in general for Postgres condition variables, but...

The thing that I personally am working on currently that is very
closely related and could use this has a more specific set of
circumstances: I want "join points" AKA barriers. Something like
pthread_barrier_t. (I'm saying "join point" rather than "barrier" to
avoid confusion with compiler and memory barriers, barrier.h etc.)
Join points let you wait for all workers in a known set to reach a
given point, possibly with a phase number or at least sense (one bit
phase counter) to detect synchronisation bugs. They also select one
worker arbitrarily to receive a different return value when releasing
workers from a join point, for cases where a particular phase of
parallel work needs to be done by exactly one worker while the others
sit on the bench: for example initialisation, cleanup or merging (CF
PTHREAD_BARRIER_SERIAL_THREAD). Clearly a join point could be not
much more than a condition variable and some state tracking arrivals
and departures, but I think that this higher level synchronisation
primitive might have an advantage over raw condition variables in the
abort case: it can know the total set of workers that its waiting for,
if they are somehow registered with it first, and registration can
include arranging for cleanup hooks to do the right thing. It's
already a requirement for a join point to know which workers exist (or
at least how many). Then the deal would then be that when you call
joinpoint_join(&some_joinpoint, phase), it will return only when all
peers have joined or detached, where the latter happens automatically
if they abort or die. Not at all sure of the details yet... but I
suspect join points are useful for a bunch of things like parallel
sort, parallel hash join (my project), and anything else involving
phases or some form of "fork/join" parallelism.

Or perhaps that type of thinking about error handling should be pushed
down to the condition variable. How would that look: all potential
signallers would have to register to deliver a goodbye signal in their
abort and shmem exit paths? Then what happens if you die before
registering? I think even if you find a way to do that I'd still need
to do similar extra work on top for my join points concept, because
although I do need waiters to be poked at the time worker aborts or
dies, one goodbye prod isn't enough: I'd also need to adjust the join
point's set of workers, or put it into error state.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#2)
Re: condition variables

On Thu, Aug 11, 2016 at 6:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But if we replace the io_in_progress locks with
condition variables, then that doesn't happen any more. Nobody is
"holding" the condition variable, so it doesn't get "released" when
the process doing I/O aborts. Instead, they just keep sleeping until
the aborting process reaches AbortBufferIO, and then it broadcasts on
the condition variable and wakes everybody up, which seems a good deal
nicer.

Hmm. I fear the only reason you see an advantage there is that you don't
(yet) have any general-purpose mechanism for an aborting transaction to
satisfy its responsibilities vis-a-vis waiters on condition variables.

In the specific case of parallel query, this problem doesn't really
arise: if an error happens, it will be propagated back to the leader
(unless it occurs in the leader as an initial matter) and the leader
will then kill of all of the other workers. Since waiting for a
condition variable doesn't block interrupts, the workers will all stop
waiting and die like the obedient lemmings they are.

Instead, this wins specifically because you stuck some bespoke logic into
AbortBufferIO. OK ... but that sounds like we're going to end up with
every single condition variable that ever exists in the system needing to
be catered for separately and explicitly during transaction abort cleanup.
Which does not sound promising from a reliability standpoint. On the
other hand, I don't know what the equivalent rule to "release all LWLocks
during abort" might look like for condition variables, so I don't know
if it's even possible to avoid that.

I don't think it is possible to avoid that. Certainly the fact that
LWLocks are automatically released is in most scenarios a big
advantage, but for content locks it isn't. I note that I didn't so
much insert bespoke logic there as replace the existing bespoke logic
with less-ugly bespoke logic.

One way to conceptualize a condition variable is that it is the wait
queue for an LWLock without the lock itself. The "lock modes" are
defined by the interaction between the condition checked by waiters
before sleeping and the shared memory updates performed before
signalling or broadcasting. This separation of concerns allows for
enormous flexibility - you could use CVs to implement an LWLock-like
construct with an arbitrary set of lock modes and an arbitrary
conflict matrix, for example. But I think that in most cases it will
be best to leave mutual exclusion to LWLocks, and use CVs when what we
need is not mutual exclusion but rather waiting for an operation
performed by some other backend to complete (successfully or
otherwise). The IO-in-progress example is just such a case: the
current spinning behavior arises from the fact that the buffer's
I/O-in-progress flag does not get cleared until AFTER processes has
been dropped off the wait queue. The normal pattern with a CV will be
(1) perform a shared memory update that will, if seen by other
processes, confirm that the condition has been met and then (2)
broadcast on the CV. There's no way for transaction abort to know
what (1) looks like, even if somehow were made aware of the CV so that
it could do (2).

I encourage you to pursue this, because indeed LWLocks aren't always
an ideal solution, but I think it requires some careful thought about
what transaction aborts will do with condition variables.

Thanks. As stated above, I believe the right answer is in fact
"nothing", because this facility is too low-level to permit a
categorical answer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#3)
Re: condition variables

On Thu, Aug 11, 2016 at 6:37 PM, Peter Geoghegan <pg@heroku.com> wrote:

I notice that you acquire a spinlock within the implementation of
condition variables. Is it worth any effort to consolidate the number
of spinlock acquisitions? In other words, maybe the most common idioms
should be baked into the ConditionVariable interface, which could save
callers from having to use their own mutex variable.

One thing to keep in mind is that spinlocks are extremely fast as long
as you don't have too many processes contending for them. With
parallel groups (or I/O-in-progress wait queues) of single digit
number of processes, I doubt that consolidating the spinlock
acquisitions will produce any measurable benefit. If we get to the
point of having parallel groups containing scores of processes, that
could change.

Also, right now we don't have enough users of the CV interface to know
what the most common idioms will be. We could speculate, but if we
do, I'll start of the speculation by guessing that there will be a lot
of diversity, and not too much that keeps getting repeated. If that
proves to be wrong, of course, we can always go back and change it
later. We're not writing this on stone tablets.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#4)
Re: condition variables

On Thu, Aug 11, 2016 at 8:44 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

In contrast, this proposal leaves it up to client code to get that
right, similarly to the way you need to do things in a certain order
when waiting for state changes with latches. You could say that it's
more error prone: I think there have been a few cases of incorrectly
coded latch/state-change wait loops in the past. On the other hand,
it places no requirements on the synchronisation mechanism the client
code uses for the related shared state. pthread_cond_wait requires
you to pass in a pointer to the related pthread_mutex_t, whereas with
this proposal client code is free to use atomic ops, lwlocks,
spinlocks or any other mutual exclusion mechanism to coordinate state
changes and deal with cache coherency.

I think you have accurately stated the pros and cons of this approach.
On the whole I think it's a good trade-off. In particular, not being
locked into a specific synchronization method seems like a good idea
from here. If you had to pick just one, it would be hard to decide
between spinlocks and LWLocks, and "atomics" isn't a sufficiently
specific thing to code to.

Then there is the question of what happens when the backend that is
supposed to be doing the signalling dies or aborts, which Tom Lane
referred to in his reply. In those other libraries there is no such
concern: it's understood that these are low level thread
synchronisation primitives and if you're waiting for something that
never happens, you'll be waiting forever. I don't know what the
answer is in general for Postgres condition variables, but...

As I noted in my reply to Tom, for parallel query, we're going to kill
all the workers at the same time, so the problem doesn't arise. When
we use this mechanism outside that context, we just have to do it
correctly. I don't think that's especially hard, but could somebody
mess it up? Sure.

The thing that I personally am working on currently that is very
closely related and could use this has a more specific set of
circumstances: I want "join points" AKA barriers. Something like
pthread_barrier_t. (I'm saying "join point" rather than "barrier" to
avoid confusion with compiler and memory barriers, barrier.h etc.)
Join points let you wait for all workers in a known set to reach a
given point, possibly with a phase number or at least sense (one bit
phase counter) to detect synchronisation bugs. They also select one
worker arbitrarily to receive a different return value when releasing
workers from a join point, for cases where a particular phase of
parallel work needs to be done by exactly one worker while the others
sit on the bench: for example initialisation, cleanup or merging (CF
PTHREAD_BARRIER_SERIAL_THREAD). Clearly a join point could be not
much more than a condition variable and some state tracking arrivals
and departures, but I think that this higher level synchronisation
primitive might have an advantage over raw condition variables in the
abort case: it can know the total set of workers that its waiting for,
if they are somehow registered with it first, and registration can
include arranging for cleanup hooks to do the right thing. It's
already a requirement for a join point to know which workers exist (or
at least how many). Then the deal would then be that when you call
joinpoint_join(&some_joinpoint, phase), it will return only when all
peers have joined or detached, where the latter happens automatically
if they abort or die. Not at all sure of the details yet... but I
suspect join points are useful for a bunch of things like parallel
sort, parallel hash join (my project), and anything else involving
phases or some form of "fork/join" parallelism.

If I'm right that the abort/die case doesn't really need any special
handling here, then I think this gets a lot simpler.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#1)
Re: condition variables

On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:

[condition-variable-v1.patch]

Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#8)
Re: condition variables

On Sun, Aug 14, 2016 at 9:04 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:

[condition-variable-v1.patch]

Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?

I poked at this a bit... OK, a lot... and have some feedback:

1. As above, we need to clear cvSleeping before setting the latch.

2. The following schedule corrupts the waitlist by trying to delete
something from it that isn't in it:

P1: ConditionVariablePrepareToSleep: push self onto waitlist
P2: ConditionVariableSignal: pop P1 from waitlist
P1: <check user condition, decide to break without ever sleeping>
P3: ConditionVariablePrepareToSleep: push self onto waitlist
P1: ConditionVariableCancelSleep: delete self from waitlist (!)

Without P3 coming along you probably wouldn't notice because the
waitlist will be happily cleared and look valid, but P3's entry gets
lost and then it hangs forever.

One solution is to teach ConditionVariableCancelSleep to check if
we're still actually there first. That can be done in O(1) time by
clearing nodes' next and prev pointers when deleting, so that we can
rely on that in a new proclist_contains() function. See attached.

3. The following schedule corrupts the waitlist by trying to insert
something into it that is already in it:

P1: ConditionVariablePrepareToSleep: push self onto waitlist
P1: <check user condition, decide to sleep>
P1: ConditionVariableSleep
P1: ConditionVariablePrepareToSleep: push self onto waitlist (!)

Nodes before and after self's pre-existing position can be forgotten
when self's node is pushed to the front of the list. That can be
fixed by making ConditionVariablePrepareToSleep also check if we're
already in the list.

4. The waitlist is handled LIFO-ly. Would it be better for the first
guy to start waiting to be woken up first, like we do for locks? The
Pthreads equivalent says that it depends on "scheduling policy". I
don't know if it matters much, just an observation.

5. The new proclist function you added is the first to work in terms
of PGPROC* rather than procno. Maybe the whole interface should work
with either PGPROC pointers or procnos? No strong view.

Please find attached a -v2 of your patch which includes suggestions
1-3 above. Like the -v1, it applies on top of
lwlocks-in-dsm-v3.patch. Also, I have attached a v2->v3 diff to show
just my proposed changes.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

diff.patchapplication/octet-stream; name=diff.patchDownload+44-3
condition-variable-v2.patchapplication/octet-stream; name=condition-variable-v2.patchDownload+299-2
#10Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#9)
Re: condition variables

On Mon, Aug 15, 2016 at 5:58 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Also, I have attached a v2->v3 diff ...

Ugh. I meant a v1->v2 diff.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#9)
Re: condition variables

On Mon, Aug 15, 2016 at 1:58 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sun, Aug 14, 2016 at 9:04 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:

[condition-variable-v1.patch]

Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?

I poked at this a bit... OK, a lot... and have some feedback:

1. As above, we need to clear cvSleeping before setting the latch.

Right, OK.

2. The following schedule corrupts the waitlist by trying to delete
something from it that isn't in it:

P1: ConditionVariablePrepareToSleep: push self onto waitlist
P2: ConditionVariableSignal: pop P1 from waitlist
P1: <check user condition, decide to break without ever sleeping>
P3: ConditionVariablePrepareToSleep: push self onto waitlist
P1: ConditionVariableCancelSleep: delete self from waitlist (!)

Without P3 coming along you probably wouldn't notice because the
waitlist will be happily cleared and look valid, but P3's entry gets
lost and then it hangs forever.

One solution is to teach ConditionVariableCancelSleep to check if
we're still actually there first. That can be done in O(1) time by
clearing nodes' next and prev pointers when deleting, so that we can
rely on that in a new proclist_contains() function. See attached.

How about instead using cvSleeping to test this? Suppose we make a
rule that cvSleeping can be changed from false to true only by the
process whose PGPROC it is, and thus no lock is needed, but changing
it from true to false always requires the spinlock.

3. The following schedule corrupts the waitlist by trying to insert
something into it that is already in it:

P1: ConditionVariablePrepareToSleep: push self onto waitlist
P1: <check user condition, decide to sleep>
P1: ConditionVariableSleep
P1: ConditionVariablePrepareToSleep: push self onto waitlist (!)

Nodes before and after self's pre-existing position can be forgotten
when self's node is pushed to the front of the list. That can be
fixed by making ConditionVariablePrepareToSleep also check if we're
already in the list.

OK.

4. The waitlist is handled LIFO-ly. Would it be better for the first
guy to start waiting to be woken up first, like we do for locks? The
Pthreads equivalent says that it depends on "scheduling policy". I
don't know if it matters much, just an observation.

I don't know whether this matters. It's possible that FIFO is a
better policy; I don't really care.

5. The new proclist function you added is the first to work in terms
of PGPROC* rather than procno. Maybe the whole interface should work
with either PGPROC pointers or procnos? No strong view.

Hmm, maybe so. But wouldn't any caller translate to a PGPROC * straight off?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#6)
Re: condition variables

On 2016-08-11 21:27:45 -0400, Robert Haas wrote:

On Thu, Aug 11, 2016 at 6:37 PM, Peter Geoghegan <pg@heroku.com> wrote:

I notice that you acquire a spinlock within the implementation of
condition variables. Is it worth any effort to consolidate the number
of spinlock acquisitions? In other words, maybe the most common idioms
should be baked into the ConditionVariable interface, which could save
callers from having to use their own mutex variable.

One thing to keep in mind is that spinlocks are extremely fast as long
as you don't have too many processes contending for them.

That's one of the conditions. The other is that the system as a whole is
not overcommitted. Because then the chance of processes being put to
sleep inside a spinlock increases.

With
parallel groups (or I/O-in-progress wait queues) of single digit
number of processes, I doubt that consolidating the spinlock
acquisitions will produce any measurable benefit. If we get to the
point of having parallel groups containing scores of processes, that
could change.

And we have no measures to manage systemwide load with paralellism yet,
I think the issue is a bit more general than the quoted paragraph.

But I also think we shouldn't yet worry about it. It seems likely that
the actual critical bottleneck is elsewhere for now.

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#11)
Re: condition variables

On Mon, Aug 15, 2016 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?

I poked at this a bit... OK, a lot... and have some feedback:

1. As above, we need to clear cvSleeping before setting the latch.

Right, OK.

I have independently faced this problem while using your patch and for
now I have updated my local copy. If possible, please send an updated
patch as this patch could be used for development of various
parallelism projects.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#13)
Re: condition variables

On Mon, Sep 5, 2016 at 3:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 15, 2016 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?

I poked at this a bit... OK, a lot... and have some feedback:

1. As above, we need to clear cvSleeping before setting the latch.

Right, OK.

I have independently faced this problem while using your patch and for
now I have updated my local copy. If possible, please send an updated
patch as this patch could be used for development of various
parallelism projects.

Thomas already posted an updated patch in the same message where he
reported the problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#14)
Re: condition variables

On Tue, Sep 6, 2016 at 5:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Sep 5, 2016 at 3:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 15, 2016 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?

I poked at this a bit... OK, a lot... and have some feedback:

1. As above, we need to clear cvSleeping before setting the latch.

Right, OK.

I have independently faced this problem while using your patch and for
now I have updated my local copy. If possible, please send an updated
patch as this patch could be used for development of various
parallelism projects.

Thomas already posted an updated patch in the same message where he
reported the problem.

Oops, I missed that, will use the same.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Robert Haas (#1)
Re: condition variables

On Thu, Aug 11, 2016 at 2:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Another approach to the problem is to use a latch wait loop. That
almost works. Interrupts can be serviced, and you can recheck shared
memory to see whether the condition for proceeding is satisfied after
each iteration of the loop. There's only one problem: when you do
something that might cause the condition to be satisfied for other
waiting backends, you need to set their latch - but you don't have an
easy way to know exactly which processes are waiting, so how do you
call SetLatch? I originally thought of adding a function like
SetAllLatches(ParallelContext *) and maybe that can work, but then I
had what I think is a better idea, which is to introduce a notion of
condition variables.

I don't see a CF entry for this. Are you planning to work on this
again soon, Robert?

I have an eye on this patch due to my work on parallel CREATE INDEX.
It would be nice to have some rough idea of when you intend to commit
this.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#16)
Re: condition variables

On Tue, Sep 13, 2016 at 10:55 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Thu, Aug 11, 2016 at 2:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Another approach to the problem is to use a latch wait loop. That
almost works. Interrupts can be serviced, and you can recheck shared
memory to see whether the condition for proceeding is satisfied after
each iteration of the loop. There's only one problem: when you do
something that might cause the condition to be satisfied for other
waiting backends, you need to set their latch - but you don't have an
easy way to know exactly which processes are waiting, so how do you
call SetLatch? I originally thought of adding a function like
SetAllLatches(ParallelContext *) and maybe that can work, but then I
had what I think is a better idea, which is to introduce a notion of
condition variables.

I don't see a CF entry for this. Are you planning to work on this
again soon, Robert?

I have an eye on this patch due to my work on parallel CREATE INDEX.
It would be nice to have some rough idea of when you intend to commit
this.

I basically figured I would commit it when and if it became clear that
it'd get good use in some other patch which was on the road to being
committed. I don't think it needs much work, just the assurance that
it will get some use.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Robert Haas (#17)
Re: condition variables

Maybe I can leave it up to you to determine if that applies in the context
of my parallel create index patch. You are the obvious candidate to review
that patch anyway, of course.

--
Peter Geoghegan

#19Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#9)
Re: condition variables

On Mon, Aug 15, 2016 at 5:58 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Please find attached a -v2 of your patch which includes suggestions
1-3 above.

Here's a rebased patch. ConditionVariableSleep now takes
wait_event_info. Anyone using this in patches for core probably needs
to add enumerators to the WaitEventXXX enums in pgstat.h to describe
their wait points.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

condition-variable-v3.patchapplication/octet-stream; name=condition-variable-v3.patchDownload+303-2
In reply to: Thomas Munro (#19)
Re: condition variables

On Tue, Oct 4, 2016 at 12:12 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a rebased patch. ConditionVariableSleep now takes
wait_event_info. Anyone using this in patches for core probably needs
to add enumerators to the WaitEventXXX enums in pgstat.h to describe
their wait points.

I think that there are at least 2 patches from EDB people that are
already dependent on this one. I myself could easily adopt parallel
CREATE INDEX to use it, too. Why the continued delay in committing it?
I think it's fairly clear that this mechanism is widely useful.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#19)
#22Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#22)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#1)
#25Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#24)
#26Thomas Munro
thomas.munro@gmail.com
In reply to: Kyotaro Horiguchi (#25)
#27Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Thomas Munro (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#26)