[concurrency-interest] ReentrantReadWriteLock in inconsistent state

davidcholmes at aapt.net.au davidcholmes at aapt.net.au
Thu Aug 2 12:47:28 EDT 2012


Phil,

Be wary if code that might swallow exceptions and so hide the fact  
that a stackoverflow occurred.

Also it would be useful to get a SIGQUIT based stack dump, which does  
occur at a safepoint. Though the fact it hangs suggests to me that the  
existing trace is accurate.

These things are extremely difficult to diagnoze without reproducible  
test cases - which is very difficult for these kinds of problem.

David

Quoting Phil Harvey <phil at philharveyonline.com>:

> Thanks for the advice guys.
>
> I've checked our code and can confirm we make no Thread.stop() or
> Thread.stop(Throwable) calls.
>
> Also, we would have seen the stack trace of a StackOverflowError in our
> logs.
>
> So I still have no idea what caused this problem. I can only assume it's a
> Java bug. Or am I jumping to conclusions  prematurely?
>
> Phil
>  On Aug 2, 2012 5:14 AM, "Stanimir Simeonoff" <stanimir at riflexo.com> wrote:
>
>> David,
>> I am quite positive it's Thread.stop, as setState is inlined. I've seen
>> that case due to Thread.stop quite a few times too.
>> Even though it's possible to avoid the disaster via some awkward steps
>> like: waiting for sleep mode/examine the stack trace, followed by
>> Thread.suspend/check again, then stop(). Alternatively peppering the code
>> w/ stop points during class loading is an option but a hard one.
>>
>> That has made me wonder if hotspot can prevent adding safe points in
>> java.util.concurrent.locks classes, or at least the safe point to skip
>> checking for Thread.stop outside park(). That's it the only safe point
>> would be park(), as side effect it can have minor performance benefits.
>>
>> I know Thread.stop is deprecated but still there is enough middleware that
>> makes use of.
>>
>> Stanimir
>>
>> On Thu, Aug 2, 2012 at 5:56 AM, <davidcholmes at aapt.net.au> wrote:
>>
>>> Phil,
>>>
>>> A RRWL that has no owner but can not be locked is definitely a problem.
>>> If this is not 6822370 then the other possibilities are async-exceptions
>>> occurring in the release code:
>>>
>>>             if (free)
>>>                 setExclusiveOwnerThread(null);
>>>             <=== async exception here
>>>             setState(nextc);
>>>
>>> Two possible sources of the async exception:
>>>
>>> a) Use of Thread.stop elsewhere
>>> b) StackOverflowException was triggered trying to call setState
>>>
>>> David Holmes
>>> ------------
>>>
>>> Quoting Phil Harvey <phil at philharveyonline.com>:
>>>
>>>  Hi,
>>>>
>>>> Yes, we had looked at that bug but assumed we were not experiencing it
>>>> here
>>>> because we are using Java 1.6.0_25, and it was reported fixed in
>>>> 1.6.0_18.
>>>>
>>>> Do you agree that the unusual state of the ReentrantReadWriteLock
>>>> suggests
>>>> we've hit a bug?
>>>>
>>>> Phil
>>>> On Aug 1, 2012 3:05 PM, "Ariel Weisberg" <ariel at weisberg.ws> wrote:
>>>>
>>>>    Hi,
>>>>>
>>>>>  I remember that. That was fixed Oracle JDK 1.6.0_18. It hasn't been
>>>>> reproducing for us since 1.6.0_18, but I am not sure if we are using
>>>>> ReentrantLock in the same way anymore.
>>>>>
>>>>>  The reproducer we used was
>>>>> https://github.com/VoltDB/**voltdb/tree/master/tools/lbd_**lock_test<https://github.com/VoltDB/voltdb/tree/master/tools/lbd_lock_test>
>>>>>  If I remember correctly it prints '.' as it goes and when it hangs it
>>>>> stops printing dots.
>>>>>
>>>>>  Regards,
>>>>>  Ariel
>>>>>
>>>>>  On Wed, Aug 1, 2012, at 09:27 AM, ?iktor ?lang wrote:
>>>>>
>>>>>  Hi Phil,
>>>>>
>>>>> Related to this?
>>>>> http://bugs.sun.com/view_bug.**do?bug_id=6822370<http://bugs.sun.com/view_bug.do?bug_id=6822370>
>>>>>
>>>>>  Cheers,
>>>>>  ?
>>>>>
>>>>>  On Wed, Aug 1, 2012 at 3:20 PM, Phil Harvey  <
>>>>> phil at philharveyonline.com>**wrote:
>>>>>
>>>>>  We had a deadlock-like failure of our application recently.
>>>>>
>>>>> I initially reported it on the BDB JE forum (
>>>>> https://forums.oracle.com/**forums/thread.jspa?messageID=**10480988<https://forums.oracle.com/forums/thread.jspa?messageID=10480988>)
>>>>> but
>>>>> further analysis of the heap and thread dumps has exposed a problem that
>>>>> looks like a Java locking bug. I'm hoping you can offer advice on
>>>>> whether
>>>>> this is the case.
>>>>>
>>>>> We?re using Oracle JVM 1.6.0_25-b06, running on Linux version:
>>>>>
>>>>> 2.6.18-194.32.1.el5.
>>>>>
>>>>> We are launching Java as follows: java -server -XX:+UseConcMarkSweepGC
>>>>> -XX:+**HeapDumpOnOutOfMemoryError -Xmx1024m ...
>>>>>
>>>>> Several consecutive thread dumps showed that Thread t at 41101 was blocked
>>>>> indefinitely in ReentrantReadWriteLock. writeLock().lock().
>>>>>
>>>>> We know from code inspection that nothing ever takes a read lock on this
>>>>> ReentrantReadWriteLock, so started trying to find out what has got its
>>>>> write lock.
>>>>>
>>>>> The output of "jstack -l" should list which thread holds this exclusive
>>>>> lock in the "locked ownable synchronizers" section but does not.
>>>>>
>>>>> Our first theory was that the owning thread might have terminated.
>>>>>
>>>>> We wrote a simple test program to explore this. We found from heap dump
>>>>> analysis that even if the owning thread terminates, the lock itself
>>>>> still
>>>>> refers to it via the ReentrantReadWriteLock.**WriteLock.sync.
>>>>> exclusiveOwnerThread field. Looking in the java.util.concurrent source
>>>>> code, it seems that this field only gets null'ed when the lock is
>>>>> released.
>>>>>
>>>>> However, looking in the heap dump taken following our "deadlock", we
>>>>> were
>>>>> surprised to find that the lock in question has a null
>>>>> sync.exclusiveOwnerThread field.
>>>>>
>>>>> Surely a write lock should be in one of two states (except possibly for
>>>>> a
>>>>> tiny instant when its state is being non-atomically switched):
>>>>>
>>>>> 1) The lock is available, and sync.exclusiveOwnerThread is null 2) The
>>>>> lock is unavailable, and sync.exclusiveOwnerThread is populated
>>>>>
>>>>> But our lock was indefinitely in this state:
>>>>>
>>>>> 3) The lock is unavailable and sync.exclusiveOwnerThread is null
>>>>>
>>>>> Does anyone know whether this represents a bug? If not, can you explain
>>>>> what it means for a lock to be in this counterintuitive state?
>>>>>
>>>>> Thanks, Phil
>>>>>
>>>>> ______________________________**_________________
>>>>> Concurrency-interest mailing list
>>>>> Concurrency-interest at cs.**oswego.edu<Concurrency-interest at cs.oswego.edu>
>>>>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Viktor Klang
>>>>>
>>>>> Akka Tech Lead
>>>>> Typesafe <http://www.typesafe.com/> - The software stack for
>>>>> applications
>>>>> that scale
>>>>>
>>>>> Twitter: @viktorklang
>>>>>   *_____________________________**__________________*
>>>>>
>>>>>  Concurrency-interest mailing list
>>>>>  Concurrency-interest at cs.**oswego.edu<Concurrency-interest at cs.oswego.edu>
>>>>>    
>>>>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> ______________________________**_________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.**oswego.edu <Concurrency-interest at cs.oswego.edu>
>>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>>
>>
>>
>






More information about the Concurrency-interest mailing list