[concurrency-interest] On A Formal Definition of 'Data-Race'

Gregg Wonderly gergg at cox.net
Tue Apr 16 19:53:21 EDT 2013


I didn't meant to say that it was "correct", only that it worked correctly, and on x86, it probably could not be tested to be "wrong".

I just trying to point out that the complexity of this is much more of a problem than I think is addressed by tooling and language support.

I am grateful that the JMM finally details exactly what to expect.  I'm frustrated by the fact that this also means that the "real rules" never changed (synchronized is important and must be used), but the behavior did.  Now we see, not just "visibility" issues, but logically "weird" behaviors out of racy code.  From my perspective, I really think that the JMM got volatile backwards.  We should of used a different keyword, the reverse of volatile, to be able to say this variable is intra-thread only.   That, for me in all of the network and multi-thread applications that I write, would be the much "less used" keyword.  I rarely need atomicity (I use synchronized or other locks when I need that) near as much as I need visibility, and visibility and "inter-thread" sanity would make it much more likely that people would get correct code first, and could then focus on faster by turning on "intra-thread" access.  That would also allow tools to assert that there was non inter-thread access, and warn about it.

Gregg

On Apr 16, 2013, at 11:58 AM, Nathan Reynolds <nathan.reynolds at oracle.com> wrote:

> Try running this code on JDK 1.4.2 on a processor with a weak memory model.  It will perform exactly as Alex has stated.  If you run it on x86, then it will behave as you want it to.
> 
> That is the beauty of JMM.  It provides a bit of sanity when trying to write code for different processors and different versions of JIT (i.e. different JVMs or even different versions of the same JVM).  It is really difficult to write code in C++ for 4 different processors and 4 different compilers.  One of them is going to bite you if you try anything fancy.  Fortunately, there is some hope in sight for C++.
> 
> Nathan Reynolds | Architect | 602.333.9091
> Oracle PSR Engineering | Server Technology
> On 4/16/2013 9:18 AM, oleksandr otenko wrote:
>> The "pre-1.5" kind of code you quote is not correct in pre-1.5, and in other mainstream languages. So not clear who is unclear about incorrectness of that code and why you insist on supporting it. 
>> 
>> It worked in pre-1.5 by accident. If you set stopping=true in another thread, it can stay in write buffers indefinitely because that's how modern CPUs work. There is nothing telling when it should be flushed. For example, suppose you have stopping=true; while(!stopped); waiting for a thread to report it stopped. That's the proof of incorrectness even in terms of pre-1.5. 
>> 
>> Alex 
>> 
>> On 16/04/2013 16:17, Gregg Wonderly wrote: 
>>> The thing I don't believe is completely understood, is how much software was written prior to JDK1.5, which explicitly did not synchronized() on reads of "shared" data, because it made code faster.  The particular scenario that I've seen in various older code, is the "boolean" while loop control variable set to "true" by a "shutdown" method.  That happened all over the place, because it was so ugly to put a "synchronized" section around a whole loop. 
>>> 
>>> boolean stopping; 
>>> 
>>> public void shutdown() { 
>>>     stopping = true; 
>>> } 
>>> 
>>> 
>>> public void run() { 
>>>     while(!stopping) { 
>>>         // do your work 
>>>     } 
>>> } 
>>> 
>>> As wrong as this is, it worked just fine prior to JDK1.5.   People didn't seem t want/see the need to, write a correct version of this, such as: 
>>> 
>>> public synchronized void shutdown() { 
>>>     stopping = true; 
>>> } 
>>> 
>>> public void run() { 
>>>     while( true ) { 
>>>         synchronized( this ) { 
>>>             if( stopping ) 
>>>                 break; 
>>>         } 
>>> 
>>>         ... 
>>>     } 
>>> } 
>>> 
>>> It's that kind of old code "issue", which I think is leaving around all kinds of open cans of worms.  This was an optimization issue.  The compiler would do 
>>> 
>>>     if( !stopping ) 
>>>         return; 
>>>     while( true ) { 
>>>         ... 
>>>     } 
>>> 
>>> Visibility issues get "masked" over by people using "correctly" or "more correctly" synchronized code, which seems to trigger cache flushes or otherwise update cache lines to cause their "wrong" code to appear to be right. 
>>> 
>>> Tools to look for these kinds of problems are starting to mature and be more and more visible and people get bit by these things. But, overall, I'm just not convinced there is any hope that developers are really going to be able to deal with these issues in any dependable fashion.  Software stability is going to suffer, and that will keep up a barrier for some people in using Java when there are bugs visible to them but which have no explainable reason, given the "visible in source" order of execution. 
>>> 
>>> We can get all excited about optimizations and speed of execution. But it doesn't matter how fast you do the wrong thing, it's still wrong. 
>>> 
>>> Gregg Wonderly 
>>> 
>>> On 4/15/2013 7:12 PM, Vitaly Davidovich wrote: 
>>>> Yes, threading is hard :) There's nothing stopping people from using different 
>>>> designs to minimize the hairy parts (e.g. message passing, immutability, etc) in 
>>>> java.  Deliberate data races are an "expert"-only tool, and last resort at 
>>>> that.  I think this list typically reiterates that point when the subject comes up. 
>>>> 
>>>> On Apr 15, 2013 6:46 PM, "Gregg Wonderly" <gregg at cytetech.com 
>>>> <mailto:gregg at cytetech.com>> wrote: 
>>>> 
>>>>     Happens-before doesn't play a part in program=order, only thread-order. 
>>>>       This is the problem that a lot of developers seem to struggle with.  If 
>>>>     you want the history of all possible paths, then the complexity of that 
>>>>     possibility is huge.  Basically for X unrelated instructions, the possible 
>>>>     histories are X^2. Because happens-before injects a "barrier", it can appear 
>>>>     to simplify the complexity, but, the global ordering still is X^2 for all 
>>>>     "blocks" between happens-before spots. 
>>>> 
>>>>     This is why I keep waving my hands here, and saying "stop that".  It makes 
>>>>     it extremely difficult for "racy" software to be understood as "racy", 
>>>>     because "reordering" introduces a whole realm of "unknown compiler 
>>>>     reorderings changing JLS specified behaviors" that many developers, may not 
>>>>     recognize as needing their attention.  I still encounter developers who just 
>>>>     don't understand that the JLS doesn't describe "inter-thread" behaviors, 
>>>>     because of the JMM.  People expect that they will just see data races, and 
>>>>     not programmatic run time data corruption such as the 
>>>>     Thread.getThread()/.setThread(__) discussion here revealed. 
>>>> 
>>>>     Many people pick java for the some reason that James Gosling wrote quite 
>>>>     some time ago, regarding it working correctly.  The JLS and JMM decoupling 
>>>>     of "behaviors" causes software to break with "corrupted data" (word tearing 
>>>>     included) and that's something which I don't think is on the radar for as 
>>>>     many people writing Java software, as seems to be expected by the aggressive 
>>>>     demands on the JMM being perfectly understood and adhered to. 
>>>> 
>>>>     Gregg 
>>>> 
>>>>     On 4/15/2013 2:18 PM, oleksandr otenko wrote: 
>>>> 
>>>> 
>>>>         On 15/04/2013 17:55, thurstonn wrote: 
>>>> 
>>>>             oleksandr otenko wrote 
>>>> 
>>>>                 When "data race" means "broken logic", there must be a place 
>>>>                 where that 
>>>>                 logic is defined. 
>>>> 
>>>>             Yes, there is an assumption in my OP that "data-race" ==> 
>>>>             "incorrect" (or 
>>>>             "broken logic" in your words) 
>>>>             Vitaly does not necessarily equate "raciness" with "incorrectness" (and 
>>>>             probably Brian as well) and that's OK with me 
>>>> 
>>>>             oleksandr otenko wrote 
>>>> 
>>>>                 Literature on linearizability introduces a history of events 
>>>>                 (not the 
>>>>                 only place where it is done so). If a valid reordering of events 
>>>>                 produces a invalid history, you have a data race. But you need a 
>>>>                 definition of a valid history. 
>>>> 
>>>>             Yes, what I'm in (a quixotic?) search for is a process that does the 
>>>>             following: 
>>>>             given a set of operations that can execute across multiple threads 
>>>>             (like the 
>>>>             example in the OP) 
>>>>             -define a set of "execution histories" (your "history of events") 
>>>>             that are 
>>>>             possible given the MM in effect and consistent with the original 
>>>>             program's 
>>>>             set of operations (of course there will be multiple such histories) 
>>>>             -each "execution history" defines at least a partial ordering among 
>>>>             conflicting operations (r/w or w/w on the same shared data item) 
>>>>             -analyze each execution history for "correctness" 
>>>>             if each possible history is correct, then you're good 
>>>>             else add explicit happens-before relations. Repeat 
>>>> 
>>>>         This sounds awfully like Java PathFinder. 
>>>> 
>>>>         Not tractable for less-trivial code. (I couldn't test a semaphore-like 
>>>>         primitive 
>>>>         with more than 4 threads). 
>>>> 
>>>> 
>>>>         Alex 
>>>> 
>>>> 
>>>>             oleksandr otenko wrote 
>>>> 
>>>>                 Your argument is that any reordering of instructions in your 
>>>>                 example is 
>>>>                 a valid history. But in order to state that, we would need to 
>>>>                 see the 
>>>>                 rest of history. The subsequent use of local will determine the 
>>>>                 validity 
>>>>                 of history. 
>>>>                 Alex 
>>>> 
>>>>             Agreed (the example 'program' is simplistic at best). 
>>>> 
>>>>             What my original post described was exactly the kind of process (viz. an 
>>>>             acyclic serialization graph) that I'm in search of, but is applied to 
>>>>             database concurrency control.  The problems are very similar (turning 
>>>>             concurrent executions into serial, partially-ordered ones; 
>>>>             operations are 
>>>>             reads/writes of data items), but they are not exactly the same.  I was 
>>>>             wondering if we could use the same techniques (with a serialization 
>>>>             graph 
>>>>             ==> "execution graph") to analyze the "correctness" of, e.g. non-locking 
>>>>             concurrent algorithms/data structures. 
>>>> 
>>>>             Thurston 
>>>> 
>>>> 
>>>> 
>>>>             On 12/04/2013 19:55, thurstonn wrote: 
>>>> 
>>>>                 In thinking about whether code is thread-safe or not, one can 
>>>>                 attempt to 
>>>>                 identify whether it 'contains a data-race'.  If not, you're 
>>>>                 good.  Else 
>>>>                 you 
>>>>                 need to add an explicit happens-before relationship. 
>>>> 
>>>>                 Which begs the question: what exactly constitutes a 'data-race'? 
>>>>                   And here 
>>>>                 I'm interested in something a little more formal than the famed 
>>>>                 judicial 
>>>>                 judgement of obscenity (I know it when I see it) 
>>>> 
>>>>                 If you do a web search, you unfortunately get quite a few divergent 
>>>>                 definitions, many of which seem to be inconsistent. 
>>>>                 IIRC, the official JMM defines a data-race as any two conflicting 
>>>>                 operations 
>>>>                 from two or more threads on shared data (where at least one of 
>>>>                 the two 
>>>>                 operations is a write). 
>>>> 
>>>>                 Brian Goetz (in his excellent  article 
>>>> <http://www.ibm.com/__developerworks/library/j-__jtp03304/&gt 
>>>> <http://www.ibm.com/developerworks/library/j-jtp03304/&gt>; ) 
>>>>                 defines 
>>>>                 data-race 
>>>>                 thusly: 
>>>> 
>>>>                 "A program is said to have a data race, and therefore not be a 
>>>>                 "properly 
>>>>                 synchronized" program, when there is a variable that is read by 
>>>>                 more than 
>>>>                 one thread, written by at least one thread, and the write and 
>>>>                 the reads 
>>>>                 are 
>>>>                 not ordered by a happens-before relationship." 
>>>> 
>>>>                 But this would mark the following code as a data-race 
>>>> 
>>>>                 int shared = 0 
>>>> 
>>>>                 Thread 1                  Thread 2                 Thread 3 
>>>>                 local = this.shared      this.shared = 10 local = this.shared 
>>>> 
>>>>                 This clearly meets his definition, yet I do not consider this a 
>>>>                 'data-race'. 
>>>> 
>>>>                 I've always relied on traditional database concurrency control 
>>>>                 theory (I 
>>>>                 still find the treatise by Bernstein, Hadzilacos, and Goodman to 
>>>>                 be the 
>>>>                 best), which has a formal definition of 'serializability', viz. 
>>>>                 that any 
>>>>                 transaction log is 'serializable', if and only if, its 
>>>>                 serialization graph 
>>>>                 is acyclic.  Why can we not use this as the basis for a formal 
>>>>                 definition 
>>>>                 of 
>>>>                 'data-race' (excluding the notion of commit and abort of course): 
>>>> 
>>>>                 "A program is said to have a data-race, if any legal (as 
>>>>                 prescribed by the 
>>>>                 MM) execution order produces a serialization graph that is *cyclic*" 
>>>> 
>>>>                 It has the advantage of a formal, mathematical model and 
>>>>                 although it is 
>>>>                 has 
>>>>                 historically been confined to databases (and transactions), it seems 
>>>>                 applicable to concurrent execution of any kind? 
>>>> 
>>>>                 Hoping that I don't get flamed. 
>>>> 
>>>> 
>>>> 
>>>>                 -- 
>>>>                 View this message in context: 
>>>> http://jsr166-concurrency.__10961.n7.nabble.com/On-A-__Formal-Definition-of-Data-__Race-tp9408.html 
>>>> <http://jsr166-concurrency.10961.n7.nabble.com/On-A-Formal-Definition-of-Data-Race-tp9408.html> 
>>>> 
>>>>                 Sent from the JSR166 Concurrency mailing list archive at Nabble.com. 
>>>> _________________________________________________ 
>>>>                 Concurrency-interest mailing list 
>>>>                 Concurrency-interest at .oswego 
>>>> http://cs.oswego.edu/mailman/__listinfo/concurrency-interest 
>>>> <http://cs.oswego.edu/mailman/listinfo/concurrency-interest> 
>>>> 
>>>> 
>>>>             _________________________________________________ 
>>>>             Concurrency-interest mailing list 
>>>>             Concurrency-interest at .oswego 
>>>> http://cs.oswego.edu/mailman/__listinfo/concurrency-interest 
>>>> <http://cs.oswego.edu/mailman/listinfo/concurrency-interest> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>             -- 
>>>>             View this message in context: 
>>>> http://jsr166-concurrency.__10961.n7.nabble.com/On-A-__Formal-Definition-of-Data-__Race-tp9408p9425.html 
>>>> <http://jsr166-concurrency.10961.n7.nabble.com/On-A-Formal-Definition-of-Data-Race-tp9408p9425.html> 
>>>> 
>>>>             Sent from the JSR166 Concurrency mailing list archive at Nabble.com. 
>>>>             _________________________________________________ 
>>>>             Concurrency-interest mailing list 
>>>>             Concurrency-interest at cs.__oswego.edu 
>>>>             <mailto:Concurrency-interest at cs.oswego.edu> 
>>>> http://cs.oswego.edu/mailman/__listinfo/concurrency-interest 
>>>> <http://cs.oswego.edu/mailman/listinfo/concurrency-interest> 
>>>> 
>>>> 
>>>>         _________________________________________________ 
>>>>         Concurrency-interest mailing list 
>>>>         Concurrency-interest at cs.__oswego.edu 
>>>>         <mailto:Concurrency-interest at cs.oswego.edu> 
>>>> http://cs.oswego.edu/mailman/__listinfo/concurrency-interest 
>>>> <http://cs.oswego.edu/mailman/listinfo/concurrency-interest> 
>>>> 
>>>> 
>>>> 
>>>>     _________________________________________________ 
>>>>     Concurrency-interest mailing list 
>>>>     Concurrency-interest at cs.__oswego.edu <mailto:Concurrency-interest at cs.oswego.edu> 
>>>>     http://cs.oswego.edu/mailman/__listinfo/concurrency-interest 
>>>> <http://cs.oswego.edu/mailman/listinfo/concurrency-interest> 
>>>> 
>>> 
>> 
>> _______________________________________________ 
>> Concurrency-interest mailing list 
>> Concurrency-interest at cs.oswego.edu 
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest 
>> 
>> 
> 
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130416/ccd035cc/attachment-0001.html>


More information about the Concurrency-interest mailing list