[concurrency-interest] Looking for ways to track down the source of a concurrency problem
peter.kovacs.1.0rc at gmail.com
Sun Mar 18 07:39:21 EDT 2007
The culprit was easier found than I had feared. It was a static
"prototype" instance which was not cloned "early enough" before used.
On 3/17/07, Peter Kovacs <peter.kovacs.1.0rc at gmail.com> wrote:
> We have a concurrency-related data integrity problem and I am not sure
> how to best proceed to find the cause.
> We have started introducing concurrency into a library which is
> essentially designed for single threaded execution. Our strategy has
> been fairly simple:
> (a) find a repetitive task performed by a coarsely grained object "R"
> (b) establish the degree of concurrency needed (ie. based on the
> number of CPUs): "n"
> (c) create "n" instances of "R"
> (d) start "n" threads with one "R" instance dedicated to (contained
> in) each individual thread.
> We have been assuming that instances of "R" themselves made use of
> objects fully contained in "R" instances (no publication at all).
> Since there is no "explicit" multithreading down the containment
> hierarchy, proper concurrent use of "R" has been thought to guarantee
> thread-safety. Since none of the objects are supposed to publish data
> outside the containment hierarchy, we assume that these contained
> objects do not have to be thread safe.
> Now it appears that somewhere down the containment hierarchy there is
> something which is inconsistent with our assumptions. Behind the
> scenes, data is somehow published/shared across multiple threads. My
> question is: what usage patterns should we look for to find out the
> The only possible problematic usage pattern I have been able to think
> of is that some of the contained instances manipulate static fields. I
> haven't found any such instances yet, but this is probably something
> to start with. Still, there may be other hideous ways in which objects
> share data across threads in such "containment hierarchy" situations.
> Please, can you tell me any other such way?
> Again: to my best knowledge, only the top level objects "R"
> participate directly in multithreading. (The threads retrieve the
> parameters for "R" from one single source concurrently, but this
> operation seems to be pretty well protected. The same applies for the
> objects consuming the results from "R" instances.)
> Interestingly, a couple of months ago we've already had a very similar
> problem with another top level function (for a different top level
> object, let's call it "U"). At that time, the appearance of the
> problem was traced back to some changes in one of the contained
> objects (an object which was contained a couple levels below "U" --
> let's call it "A"). The changes were reverted and the problem
> disappeared. (I was not involved in solving that issue, but I suspect
> that no thorough examination had been performed to find out the real
> This time round, after a due-diligence check of the code actually
> multithreading "R", I have tried various versions of "A" which is also
> contained in "R". It turned out that with some of the earlier versions
> of "A", the problem goes away. (We haven't seen the problem with "U"
> every since. But that's a separate function, a separate issue, so...)
> After fleetingly checking the code of "A", I haven't found any
> conspicuous "irregularities". (The closest I came to finding
> suspicious code was some static initializers in classes instantiated
> by "A", but static initializers themselves are, I seem to remember,
> Any help appreciated,
More information about the Concurrency-interest