[concurrency-interest] Looking for ways to track down the source of a concurrency problem

Peter Kovacs peter.kovacs.1.0rc at gmail.com
Sat Mar 17 06:15:32 EDT 2007


Hi,

We have a concurrency-related data integrity problem and I am not sure
how to best proceed to find the cause.

We have started introducing concurrency into a library which is
essentially designed for single threaded execution. Our strategy has
been fairly simple:
(a) find a repetitive task performed by a coarsely grained object "R"
(b) establish the degree of concurrency needed (ie. based on the
number of CPUs): "n"
(c) create "n" instances of "R"
(d) start "n" threads with one "R" instance dedicated to (contained
in) each individual thread.

We have been assuming that instances of "R" themselves made use of
objects fully contained in "R" instances (no publication at all).
Since there is no "explicit" multithreading down the containment
hierarchy, proper concurrent use of "R" has been thought to guarantee
thread-safety. Since none of the objects are supposed to publish data
outside the containment hierarchy, we assume that these contained
objects do not have to be thread safe.

Now it appears that somewhere down the containment hierarchy there is
something which is inconsistent with our assumptions. Behind the
scenes, data is somehow published/shared across multiple threads. My
question is: what usage patterns should we look for to find out the
problem?

The only possible problematic usage pattern I have been able to think
of is that some of the contained instances manipulate static fields. I
haven't found any such instances yet, but this is probably something
to start with. Still, there may be other hideous ways in which objects
share data across threads in such "containment hierarchy" situations.
Please, can you tell me any other such way?

Again: to my best knowledge, only the top level objects "R"
participate directly in multithreading. (The threads retrieve the
parameters for "R" from one single source concurrently, but this
operation seems to be pretty well protected. The same applies for the
objects consuming the results from "R" instances.)

Interestingly, a couple of months ago we've already had a very similar
problem with another top level function (for a different top level
object, let's call it "U"). At that time, the appearance of the
problem was traced back to some changes in one of the contained
objects (an object which was contained a couple levels below "U" --
let's call it "A"). The changes were reverted and the problem
disappeared. (I was not involved in solving that issue, but I suspect
that no thorough examination had been performed to find out the real
cause.)

This time round, after a due-diligence check of the code actually
multithreading "R", I have tried various versions of "A" which is also
contained in "R". It turned out that with some of the earlier versions
of "A", the problem goes away. (We haven't seen the problem with "U"
every since. But that's a separate function, a separate issue, so...)
After fleetingly checking the code of "A", I haven't found any
conspicuous "irregularities". (The closest I came to finding
suspicious code was some static initializers in classes instantiated
by "A", but static initializers themselves are, I seem to remember,
thread-safe.)

Any help appreciated,
Peter


More information about the Concurrency-interest mailing list