[concurrency-interest] ReentrantReadWriteLock, multiple CPUs, lock not released caused deadlock [JDK 1.6]

Howard Lewis Ship hlship at gmail.com
Tue Oct 14 10:56:04 EDT 2008


I'm at my wits end with respect to a hard to reproduce deadlock bug in
the Tapestry 5 code base.

Tapestry 5 has a couple of places where it wants to "serialize" access
to internal data structures. For context, the serialized access is
used to periodically check to see if underlying files (class files,
templates, etc.) have changed, and to clear caches and
even create new class loaders as necessary. Serializing access to run
the checks (and react to changes) ensures that behavior of a Tapestry
application is consistent, even under load, even when files are
changed, even in production.

The problem my users are seeing is that, in true multi-CPU (or at
least, multi-core) scenarios, a deadlock related to
ReentrantReadWriteLock is occuring.

I wrote a wrapper class around ReentrantReadWriteLock for this
purpose; most code acquires the read lock and does its work.
Periodically, one thread will acquire the write lock (to serialize
access) and do the extra work of checking file time stamps and
clearing caches as necessary.

We use tryLock(), with a timeout, so that on a very busy system, we
don't wait a very long time for the write lock, but instead defer the
file update checks for a quieter time.

What's happening is that, rarely, we're getting a deadlock over the
internal ReentrantReadWriteLock$NonfairSync object inside the lock.

My theory is that the write lock is being locked, but times out, and
the code continues on without releasing the write lock, leading to a
deadlock on the read lock. Users have noted this occurs with slower
(or virtual) machines, which increases the likelihood of a timeout.

This seems to match
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6571733, bug that's
marked as a duplicate of a fixed bug.

Here's a link to the source:

http://tapestry.apache.org/tapestry5/apidocs/src-html/org/apache/tapestry5/ioc/internal/util/ConcurrentBarrier.html

Note that the code is designed for JDK 1.5, so there's some extra
business with a ThreadLocal to track whether the current thread has
the read lock or not. All that business about synchronizing the
ThreadLocal (joy!) is to work around another JDK 1.5 bug related to
ThreadLocals (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5025230).
 And people ask why the world needs frameworks!

Any guidance on how to further clarify this issue would be of great use.

-- 
Howard M. Lewis Ship

Creator Apache Tapestry and Apache HiveMind


More information about the Concurrency-interest mailing list