Ideas for improving Concurrency Tests

Started by Amit Kapilaalmost 13 years ago3 messages
#1Amit Kapila
amit.kapila@huawei.com

Ideas for improving Concurrency testing

1. Synchronization points in server code - To have better control for
concurrency testing, define synchronization points in server code which can
be used as follows:

heap_truncate(..)
{
....

SYNC_POINT(procid,'before_heap_open')
rel = heap_open(rid,
AccessExclusiveLock);
relations = lappend(relations, rel);
}

exec_simple_query(..)
{
...

finish_xact_command();

SYNC_POINT(procid,'finish_xact_command')

/*
* If there were no parsetrees,
return EmptyQueryResponse message.
*/
if (!parsetree_list)
NullCommand(dest);
...
}

When code reaches at sync point it can
either emit a signal
or wait for a signal

Signal
A value of a shared memory variable that
will be interpretted by different
SYNC POINTS based on it's value.

Emit a signal
Assign the value (the signal) to the
shared memory variable ("set a flag") and
broadcast a global condition to wake
those waiting for a signal.

Wait for a signal
Loop over waiting for the global
condition until the global value matches
the wait-for signal

To activate Synchronization points appropriate
actions can be set.
For Example,
SET SYNC_POINT = 'before_heap_open WAIT_FOR
commit';
SET SYNC_POINT = 'after_finish_xact_command
SIGNAL commit';

This above commands can activate the synchronization
points named 'before_heap_open'
and 'after_finish_xact_command'.

session "s1"
step s11 {SET SYNC_POINT = 'before_heap_open
WAIT_FOR commit';}
step s12 {Truncate tbl;}
session "s2"
step s21 {SET SYNC_POINT =
'after_finish_xact_command SIGNAL commit';}
step s22 {Insert into tbl values(1);}

The first activation requests the synchronization
point to wait for
another backend to emit the signal 'commit', and
second activation requests
the synchronization point to emit the signal
'commit', when the process's execution runs through
the synchronization point.

Above defined test will allow Truncate table to wait
for Insert to finish

2. Enhance Isolation Framework - Currently, at most one step can be waiting
at a time. Enhance Concurrency test framework (isolation tester) to make
multiple sessions wait and then allow to release it serially.

This might help in
generating complex dead lock scenario's.

Above ideas could be useful to improve concurrency testing and can also be
helpful to generate test cases for some of the complicated bugs for which
there is no direct test.

This work is not a patch for 9.3, I just wanted an initial feedback.

Feedback/Suggestions?

Reference : http://dev.mysql.com/doc/internals/en/debug-sync-facility.html

With Regards,

Amit Kapila.

#2Greg Stark
stark@mit.edu
In reply to: Amit Kapila (#1)
Re: Ideas for improving Concurrency Tests

On Tue, Mar 26, 2013 at 7:31 AM, Amit Kapila <amit.kapila@huawei.com> wrote:

Above ideas could be useful to improve concurrency testing and can also be
helpful to generate test cases for some of the complicated bugs for which
there is no direct test.

I wonder how much explicit sync points would help with testing though.
It seems like they suffer from the problem that you'll only put sync
points where you actually expect problems and not where you don't
expect them -- which is exactly where problems are likely to occur.

Wouldn't it be more useful to implicitly create sync points whenever
synchronization events like spinlocks being taken occur?

And likewise explicitly listing the timing sequences to test seems
unconvincing. If we could arrange for two threads to execute every
possible interleaving of code by exhaustively trying every combination
that would be far more convincing. Most bugs are likely to hang out in
combinations we don't see in practice -- for instance having a tuple
deleted and a new one inserted in the same slot in the time a
different transaction was context switched out.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Amit Kapila
amit.kapila@huawei.com
In reply to: Greg Stark (#2)
Re: Ideas for improving Concurrency Tests

On Tuesday, March 26, 2013 9:49 PM Greg Stark wrote:

On Tue, Mar 26, 2013 at 7:31 AM, Amit Kapila <amit.kapila@huawei.com>
wrote:

Above ideas could be useful to improve concurrency testing and can

also be

helpful to generate test cases for some of the complicated bugs for

which

there is no direct test.

I wonder how much explicit sync points would help with testing though.
It seems like they suffer from the problem that you'll only put sync
points where you actually expect problems and not where you don't
expect them -- which is exactly where problems are likely to occur.

We can do it for different kind of operations. For example:
1. All the operations which are done in Phase:
a. Create Index Concurrently - Some time back, I was going through the
design of Create Index Concurrently and I found a problem
which I reported in mail below:

/messages/by-id/006801cdb72e$96b62330$c4226990$@kapila@
huawei.com
It occurs because we change design/implementation for
RelationGetIndexList() to address Drop Index Concurrently.
Such issues are sometimes difficult to catch through normal tests.
However if we have defined sync points for each phase
and its dependent operations, it would be comparatively easier to
catch if any change occurs.
It could have been caught if we could define sync points for step-3
and step-4 as mentioned in mail.

b. Alter Table - In this also we do the operation in 3 phases, so we can
define sync points between each phase and its dependent ops.

2. Some time back, one defect is fixed for concurrency between insert
cleaning the btree page and vacuum,
Commit log:
/messages/by-id/E1Rzvx1-0005nB-1p@gemulon.postgresql.or
g
Even if such synchronization points are difficult to think ahead, we
can protect their breakage later on by some other change by having test case
for them.
Such tests would also need sync points.

Wouldn't it be more useful to implicitly create sync points whenever
synchronization events like spinlocks being taken occur?

It will be really useful, but how in such cases will we make sure from test
case that what action (WAIT, SIGNAL or IGNORE) to take on sync point. For
example

S-1
Insert into tbl values(1);
S-2
Select * from tbl;

If both S-1,S-2 run parallel, it could be difficult to say weather '1' will
be visible to S-2.

However if S-2 waits for signal in GetSnapshotData() before taking
ProcArrayLock, and S-1 sets the signal after release of ProcArrayLock in
function ProcArrayEndTransaction,
S-2 can expect to see value '1'.

For above test, how will we make sure that only S-2 should wait in
GetSnapshotData not S-2?

Could you elaborate bit more, may be I am not getting your point completely?

And likewise explicitly listing the timing sequences to test seems
unconvincing. If we could arrange for two threads to execute every
possible interleaving of code by exhaustively trying every combination
that would be far more convincing.

I think for this part, the main point is how from test, we can synchronize
each interleaving part of code.
Any ideas how this can be realized?

Most bugs are likely to hang out in
combinations we don't see in practice -- for instance having a tuple
deleted and a new one inserted in the same slot in the time a
different transaction was context switched out.

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers