[concurrency-interest] On A Formal Definition of 'Data-Race'
nathan.reynolds at oracle.com
Tue Apr 16 11:16:50 EDT 2013
> the point is that "experts" make mistakes as well (there's a great
academic paper (that I can't find the link to at the moment) that
describes some putatively thread-safe program that ran continuously for
2+ years before it failed)
One piece of code had a home-grown lock. The code was 10 years old and
was left alone. So, it wasn't suffering from split-brain development.
However, we would hear about the C++ server rarely crashing in
production whenever doing an operation related to that lock.
I added lock hygiene to all of our lock implementations (i.e. double
constructor calls, double deletes, use before constructor, use after
destructor, destructor called while lock busy, recursive acquires,
unexpected release, etc). After doing so, I found a short-cut through
the home-grown lock which would allow multiple threads to enter the
critical region concurrently.
This isn't quite the "ran continuously for 2+ years" case, but
concurrency issues are difficult to figure out. I am not surprised that
a program can run that long without crashing. We never reproduced the
issue, but the reports of the server crashing went away. So, there are
probably a lot more concurrency issues lurking about. Fortunately with
lock hygiene, we are more likely to catch them with enough information
to fix them.
Architect | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
On 4/16/2013 5:01 AM, thurstonn wrote:
> Yes, concurrency is hard.
> So is database concurrency control. But there is a formal methodology for
> analyzing it (even if it is NP-complete)
> It seems to me that the lack of something similar for analyzing
> multi-threaded code on SMP systems is a real failure of computer science. I
> mean we have a MM.
> Even if you accept the "leave it to the experts" prescription, the point is
> that "experts" make mistakes as well (there's a great academic paper (that I
> can't find the link to at the moment) that describes some putatively
> thread-safe program that ran continuously for 2+ years before it failed)
> The "how do you know this program is thread-safe"?
> "I thought *really* hard about it"
> I can't be the only one who finds that deeply unsatisfying
> View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/On-A-Formal-Definition-of-Data-Race-tp9408p9434.html
> Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest