[concurrency-interest] On A Formal Definition of 'Data-Race'
hans.boehm at hp.com
Fri Apr 12 15:58:29 EDT 2013
Right. The absence of a data race, as defined by the JMM guarantees that synchronization-free code sequences appear indivisible. No other thread can see an intermediate state. That means that:
- Programmer no longer has to reason about instruction-level interleaving. As far as they're concerned, context switches only happen at synchronization operations, even if there is a multiprocessor involved.
- It no longer matters whether your stores are done a byte or a word at a time.
- Sync-free library calls to e.g. copying an array, are no different from single byte assignments; they appear to happen at once, and you no longer need to reason about intermediate, half-completed. int assignments behave like long assignments.
- Compilers can optimize in a fairly conventional way within these sync-free regions, and you can't tell the difference.
These can clearly not hold unless programs like the one below are considered to have a data race.
There are asymptotically efficient (perhaps 10x-100x slowdown) ways to test dynamically whether a data race has occurred. (And much more efficiently with hardware assist.) I don't think there are practical ways to statically and fully accurately check non-tiny applications for data races.
From: concurrency-interest-bounces at cs.oswego.edu [mailto:concurrency-interest-bounces at cs.oswego.edu] On Behalf Of Ruslan Cheremin
Sent: Friday, April 12, 2013 12:12 PM
Cc: concurrency-interest at cs.oswego.edu
Subject: Re: [concurrency-interest] On A Formal Definition of 'Data-Race'
As far, as I understand, notation of data race comes from technical, not mathematical point of view. Data race, defined as above, is exactly the case there it is hard for _implementation_ to hide dirty details of underlaying mechanics -- e.g that stores/loads are not "instant", and may have "stages", they can be pipelined, so started executing early and actually finished latter. Hide this details is not impossible, in theory, but just hard to implement. And because of this, it was decided to put on programmer the responsibility to "order" such instructions. There are exist algorithms to find and order such races, but, afaik, they are NP-complete with N around total instructions in program (i.e. such algorithms require global program analyze), so not practical today.
But this would mark the following code as a data-race
int shared = 0
Thread 1 Thread 2 Thread 3
local = this.shared this.shared = 10 local = this.shared
This clearly meets his definition, yet I do not consider this a 'data-race'.
I've always relied on traditional database concurrency control theory (I
still find the treatise by Bernstein, Hadzilacos, and Goodman to be the
best), which has a formal definition of 'serializability', viz. that any
transaction log is 'serializable', if and only if, its serialization graph
is acyclic. Why can we not use this as the basis for a formal definition of
'data-race' (excluding the notion of commit and abort of course):
"A program is said to have a data-race, if any legal (as prescribed by the
MM) execution order produces a serialization graph that is *cyclic*"
It has the advantage of a formal, mathematical model and although it is has
historically been confined to databases (and transactions), it seems
applicable to concurrent execution of any kind?
Hoping that I don't get flamed.
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/On-A-Formal-Definition-of-Data-Race-tp9408.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest