[concurrency-interest] On A Formal Definition of 'Data-Race'

Nathan Reynolds nathan.reynolds at oracle.com
Tue Apr 16 11:16:50 EDT 2013


 > the point is that "experts" make mistakes as well (there's a great 
academic paper (that I can't find the link to at the moment) that 
describes some putatively thread-safe program that ran continuously for 
2+ years before it failed)

One piece of code had a home-grown lock.  The code was 10 years old and 
was left alone.  So, it wasn't suffering from split-brain development.  
However, we would hear about the C++ server rarely crashing in 
production whenever doing an operation related to that lock.

I added lock hygiene to all of our lock implementations (i.e. double 
constructor calls, double deletes, use before constructor, use after 
destructor, destructor called while lock busy, recursive acquires, 
unexpected release, etc).  After doing so, I found a short-cut through 
the home-grown lock which would allow multiple threads to enter the 
critical region concurrently.

This isn't quite the "ran continuously for 2+ years" case, but 
concurrency issues are difficult to figure out.  I am not surprised that 
a program can run that long without crashing.  We never reproduced the 
issue, but the reports of the server crashing went away.  So, there are 
probably a lot more concurrency issues lurking about.  Fortunately with 
lock hygiene, we are more likely to catch them with enough information 
to fix them.

Nathan Reynolds 
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> | 
Architect | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
On 4/16/2013 5:01 AM, thurstonn wrote:
> Yes, concurrency is hard.
> So is database concurrency control. But there is a formal methodology for
> analyzing it (even if it is NP-complete)
>
> It seems to me that the lack of something similar for analyzing
> multi-threaded code on SMP systems is a real failure of computer science.  I
> mean we have a MM.
> Even if you accept the "leave it to the experts" prescription, the point is
> that "experts" make mistakes as well (there's a great academic paper (that I
> can't find the link to at the moment) that describes some putatively
> thread-safe program that ran continuously for 2+ years before it failed)
>
> The "how do you know this program is thread-safe"?
> Pause.
> "I thought *really* hard about it"
>
> I can't be the only one who finds that deeply unsatisfying
>
>
>
>
>
>
> --
> View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/On-A-Formal-Definition-of-Data-Race-tp9408p9434.html
> Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130416/95313dfc/attachment.html>


More information about the Concurrency-interest mailing list