[concurrency-interest] On A Formal Definition of 'Data-Race'

Gregg Wonderly gregg at cytetech.com
Mon Apr 15 09:29:21 EDT 2013


Atomic views of the changes in values is a different issue than the 
Happens-Before model creates/addresses.  Total program order is different from 
thread order.  If you expect total program order to have a specific behavior, 
and you use more than 1 thread, then you've created a situation that you need to 
manage explicitly.

It's like a traffic light.  The light must be honored for safe traffic flow 
guarantees.  But, if someone ignores the light, safe traffic flow is not 
necessarily denied, but must be negotiated.

Happens before is like the case of no traffic lights.

Gregg Wonderly

On 4/13/2013 3:04 PM, Brian Goetz wrote:
> Consider this similar-looking programing:
>
> Object lock = new Object();
> int shared = 0;
>
> Thread 1: synchronized (lock) { local = shared; }
> Thread 2: synchronized (lock) ( shared = 10; }
> Thread 3: synchronized (lock) { local = shared; }
>
> Is what you're saying "in either my racy program or Brian's non-racy program,
> the effect will be the same -- threads 1 and 3 will each see either 0 or 10, and
> can't predict which because they do not control the timing of thread scheduling
> and lock acquisition", and from there, going to "so why do we call one racy and
> not the other?"
>
> In a properly synchronized program, you only see changes to shared variables at
> synchronization points, whereas in a racy program, they happen whenever, and the
> order in which they appear to happen may be different from the perspective of
> different threads.
>
> Or, maybe you just don't like the happens-before model because its different
> from what you're used to?  HB seems no less well-defined and tractable than the
> other models you seem to like.  The rules are simple:
>
>   - Synchronization actions (lock acquire/release, volatile read/write) are
> totally ordered
>   - If two actions A and B are in the same thread, and A precedes B in the
> program order, then A happens-before B
>   - Writes of volatile variables happen-before subsequent reads of that same
> variable (we can say "subsequent" because of the first rule above)
>   - Releasing a lock happens-before subsequent acquisitions of that lock
>
> There are a few other rules (having to do with starting/joining with threads,
> finalizers, etc) but they rarely come into play.
>
> On 4/12/2013 2:55 PM, thurstonn wrote:
>> In thinking about whether code is thread-safe or not, one can attempt to
>> identify whether it 'contains a data-race'.  If not, you're good.  Else you
>> need to add an explicit happens-before relationship.
>>
>> Which begs the question: what exactly constitutes a 'data-race'?  And here
>> I'm interested in something a little more formal than the famed judicial
>> judgement of obscenity (I know it when I see it)
>>
>> If you do a web search, you unfortunately get quite a few divergent
>> definitions, many of which seem to be inconsistent.
>> IIRC, the official JMM defines a data-race as any two conflicting operations
>> from two or more threads on shared data (where at least one of the two
>> operations is a write).
>>
>> Brian Goetz (in his excellent  article
>> <http://www.ibm.com/developerworks/library/j-jtp03304/>  ) defines data-race
>> thusly:
>>
>> "A program is said to have a data race, and therefore not be a "properly
>> synchronized" program, when there is a variable that is read by more than
>> one thread, written by at least one thread, and the write and the reads are
>> not ordered by a happens-before relationship."
>>
>> But this would mark the following code as a data-race
>>
>> int shared = 0
>>
>> Thread 1                  Thread 2                 Thread 3
>> local = this.shared      this.shared = 10       local = this.shared
>>
>> This clearly meets his definition, yet I do not consider this a 'data-race'.
>>
>> I've always relied on traditional database concurrency control theory (I
>> still find the treatise by Bernstein, Hadzilacos, and Goodman to be the
>> best), which has a formal definition of 'serializability', viz. that any
>> transaction log is 'serializable', if and only if, its serialization graph
>> is acyclic.  Why can we not use this as the basis for a formal definition of
>> 'data-race' (excluding the notion of commit and abort of course):
>>
>> "A program is said to have a data-race, if any legal (as prescribed by the
>> MM) execution order produces a serialization graph that is *cyclic*"
>>
>> It has the advantage of a formal, mathematical model and although it is has
>> historically been confined to databases (and transactions), it seems
>> applicable to concurrent execution of any kind?
>>
>> Hoping that I don't get flamed.
>>
>>
>>
>> --
>> View this message in context:
>> http://jsr166-concurrency.10961.n7.nabble.com/On-A-Formal-Definition-of-Data-Race-tp9408.html
>>
>> Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>



More information about the Concurrency-interest mailing list