[concurrency-interest] On A Formal Definition of 'Data-Race'

thurstonn thurston at nomagicsoftware.com
Fri Apr 12 14:55:44 EDT 2013


In thinking about whether code is thread-safe or not, one can attempt to
identify whether it 'contains a data-race'.  If not, you're good.  Else you
need to add an explicit happens-before relationship.

Which begs the question: what exactly constitutes a 'data-race'?  And here
I'm interested in something a little more formal than the famed judicial
judgement of obscenity (I know it when I see it)

If you do a web search, you unfortunately get quite a few divergent
definitions, many of which seem to be inconsistent.  
IIRC, the official JMM defines a data-race as any two conflicting operations
from two or more threads on shared data (where at least one of the two
operations is a write).

Brian Goetz (in his excellent  article
<http://www.ibm.com/developerworks/library/j-jtp03304/>  ) defines data-race
thusly:

"A program is said to have a data race, and therefore not be a "properly
synchronized" program, when there is a variable that is read by more than
one thread, written by at least one thread, and the write and the reads are
not ordered by a happens-before relationship."

But this would mark the following code as a data-race

int shared = 0

Thread 1                  Thread 2                 Thread 3
local = this.shared      this.shared = 10       local = this.shared

This clearly meets his definition, yet I do not consider this a 'data-race'.

I've always relied on traditional database concurrency control theory (I
still find the treatise by Bernstein, Hadzilacos, and Goodman to be the
best), which has a formal definition of 'serializability', viz. that any
transaction log is 'serializable', if and only if, its serialization graph
is acyclic.  Why can we not use this as the basis for a formal definition of
'data-race' (excluding the notion of commit and abort of course):

"A program is said to have a data-race, if any legal (as prescribed by the
MM) execution order produces a serialization graph that is *cyclic*"

It has the advantage of a formal, mathematical model and although it is has
historically been confined to databases (and transactions), it seems
applicable to concurrent execution of any kind?

Hoping that I don't get flamed.



--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/On-A-Formal-Definition-of-Data-Race-tp9408.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.


More information about the Concurrency-interest mailing list