[concurrency-interest] ReentrantReadWriteLock in inconsistent state

√iktor Ҡlang viktor.klang at gmail.com
Thu Aug 2 06:16:59 EDT 2012


On Thu, Aug 2, 2012 at 12:04 PM, Dr Heinz M. Kabutz <
heinz at javaspecialists.eu> wrote:

> Which again leads me to think that it would be immensely useful to have at
> least some form of testing that a given JVM / OS / processor stack does not
> violate the JMM.  I know how impossible or difficult such a test suite
> would be to put together, but I believe we will see more bugs like this in
> future, as programmers write more risky code and the HotSpot profiler tries
> to eek the last ounce of performance out of the hardware.  How else do we
> know that the JVM is not broken?


+1 from someone who needs to maintain a high-performance concurrency
library for a variety of JVMs and OSes.

Cheers,
√


>
>
> Regards
>
> Heinz
> --
> Dr Heinz M. Kabutz (PhD CompSci)
> Author of "The Java(tm) Specialists' Newsletter"
> Sun Java Champion
> IEEE Certified Software Development Professional
> http://www.javaspecialists.eu
> Tel: +30 69 75 595 262
> Skype: kabutz
>
>
> On 8/2/12 12:18 PM, Doug Lea wrote:
>
>> On 08/02/12 04:15, Phil Harvey wrote:
>>
>>> Thanks for the advice guys.
>>>
>>> I've checked our code and can confirm we make no Thread.stop() or
>>> Thread.stop(Throwable) calls.
>>>
>>> Also, we would have seen the stack trace of a StackOverflowError in our
>>> logs.
>>>
>>> So I still have no idea what caused this problem. I can only assume it's
>>> a Java
>>> bug. Or am I jumping to conclusions  prematurely?
>>>
>>
>> Everyone (including me) who has looked at this agrees that the
>> the situation you describe "cannot" happen at the Java level.
>> So it could be a VM, OS, or processor bug. But until there
>> is a self-contained test case, I don't think much can be
>> done to further diagnose.
>>
>> -Doug
>>
>>
>>
>>> Phil
>>>
>>> On Aug 2, 2012 5:14 AM, "Stanimir Simeonoff" <stanimir at riflexo.com
>>> <mailto:stanimir at riflexo.com>> wrote:
>>>
>>>     David,
>>>     I am quite positive it's Thread.stop, as setState is inlined. I've
>>> seen that
>>>     case due to Thread.stop quite a few times too.
>>>     Even though it's possible to avoid the disaster via some awkward
>>> steps like:
>>>     waiting for sleep mode/examine the stack trace, followed by
>>>     Thread.suspend/check again, then stop(). Alternatively peppering the
>>> code w/
>>>     stop points during class loading is an option but a hard one.
>>>
>>>     That has made me wonder if hotspot can prevent adding safe points in
>>>     java.util.concurrent.locks classes, or at least the safe point to
>>> skip
>>>     checking for Thread.stop outside park(). That's it the only safe
>>> point would
>>>     be park(), as side effect it can have minor performance benefits.
>>>
>>>     I know Thread.stop is deprecated but still there is enough
>>> middleware that
>>>     makes use of.
>>>
>>>     Stanimir
>>>
>>>     On Thu, Aug 2, 2012 at 5:56 AM, <davidcholmes at aapt.net.au
>>>     <mailto:davidcholmes at aapt.net.**au <davidcholmes at aapt.net.au>>>
>>> wrote:
>>>
>>>         Phil,
>>>
>>>         A RRWL that has no owner but can not be locked is definitely a
>>> problem.
>>>         If this is not 6822370 then the other possibilities are
>>> async-exceptions
>>>         occurring in the release code:
>>>
>>>                      if (free)
>>>                          setExclusiveOwnerThread(null);
>>>         <=== async exception here
>>>                      setState(nextc);
>>>
>>>         Two possible sources of the async exception:
>>>
>>>         a) Use of Thread.stop elsewhere
>>>         b) StackOverflowException was triggered trying to call setState
>>>
>>>         David Holmes
>>>         ------------
>>>
>>>         Quoting Phil Harvey <phil at philharveyonline.com
>>>         <mailto:phil at philharveyonline.**com <phil at philharveyonline.com>
>>> >>:
>>>
>>>             Hi,
>>>
>>>             Yes, we had looked at that bug but assumed we were not
>>> experiencing
>>>             it here
>>>             because we are using Java 1.6.0_25, and it was reported
>>> fixed in
>>>             1.6.0_18.
>>>
>>>             Do you agree that the unusual state of the
>>> ReentrantReadWriteLock
>>>             suggests
>>>             we've hit a bug?
>>>
>>>             Phil
>>>             On Aug 1, 2012 3:05 PM, "Ariel Weisberg" <ariel at weisberg.ws
>>>             <mailto:ariel at weisberg.ws>> wrote:
>>>
>>>                    Hi,
>>>
>>>                   I remember that. That was fixed Oracle JDK 1.6.0_18.
>>> It hasn't
>>>                 been
>>>                 reproducing for us since 1.6.0_18, but I am not sure if
>>> we are using
>>>                 ReentrantLock in the same way anymore.
>>>
>>>                   The reproducer we used was
>>>                 https://github.com/VoltDB/__**
>>> voltdb/tree/master/tools/lbd__**_lock_test<https://github.com/VoltDB/__voltdb/tree/master/tools/lbd___lock_test>
>>>                 <https://github.com/VoltDB/**
>>> voltdb/tree/master/tools/lbd_**lock_test<https://github.com/VoltDB/voltdb/tree/master/tools/lbd_lock_test>
>>> >
>>>                   If I remember correctly it prints '.' as it goes and
>>> when it
>>>                 hangs it
>>>                 stops printing dots.
>>>
>>>                   Regards,
>>>                   Ariel
>>>
>>>                   On Wed, Aug 1, 2012, at 09:27 AM, ?iktor ?lang wrote:
>>>
>>>                   Hi Phil,
>>>
>>>                 Related to this?
>>>                 http://bugs.sun.com/view_bug._**_do?bug_id=6822370<http://bugs.sun.com/view_bug.__do?bug_id=6822370>
>>>                 <http://bugs.sun.com/view_bug.**do?bug_id=6822370<http://bugs.sun.com/view_bug.do?bug_id=6822370>
>>> >
>>>
>>>                   Cheers,
>>>                   ?
>>>
>>>                   On Wed, Aug 1, 2012 at 3:20 PM, Phil Harvey
>>>                 <phil at philharveyonline.com
>>>                 <mailto:phil at philharveyonline.**com<phil at philharveyonline.com>
>>> >>__wrote:
>>>
>>>                   We had a deadlock-like failure of our application
>>> recently.
>>>
>>>                 I initially reported it on the BDB JE forum (
>>>                 https://forums.oracle.com/__**
>>> forums/thread.jspa?messageID=_**_10480988<https://forums.oracle.com/__forums/thread.jspa?messageID=__10480988>
>>>                 <https://forums.oracle.com/**
>>> forums/thread.jspa?messageID=**10480988<https://forums.oracle.com/forums/thread.jspa?messageID=10480988>
>>> >)
>>>                 but
>>>                 further analysis of the heap and thread dumps has
>>> exposed a
>>>                 problem that
>>>                 looks like a Java locking bug. I'm hoping you can offer
>>> advice
>>>                 on whether
>>>                 this is the case.
>>>
>>>                 We?re using Oracle JVM 1.6.0_25-b06, running on Linux
>>> version:
>>>
>>>                 2.6.18-194.32.1.el5.
>>>
>>>                 We are launching Java as follows: java -server
>>>                 -XX:+UseConcMarkSweepGC
>>>                 -XX:+__**HeapDumpOnOutOfMemoryError -Xmx1024m ...
>>>
>>>                 Several consecutive thread dumps showed that Thread
>>> t at 41101 was
>>>                 blocked
>>>                 indefinitely in ReentrantReadWriteLock.
>>> writeLock().lock().
>>>
>>>                 We know from code inspection that nothing ever takes a
>>> read lock
>>>                 on this
>>>                 ReentrantReadWriteLock, so started trying to find out
>>> what has
>>>                 got its
>>>                 write lock.
>>>
>>>                 The output of "jstack -l" should list which thread holds
>>> this
>>>                 exclusive
>>>                 lock in the "locked ownable synchronizers" section but
>>> does not.
>>>
>>>                 Our first theory was that the owning thread might have
>>> terminated.
>>>
>>>                 We wrote a simple test program to explore this. We found
>>> from
>>>                 heap dump
>>>                 analysis that even if the owning thread terminates, the
>>> lock
>>>                 itself still
>>>                 refers to it via the ReentrantReadWriteLock.__**
>>> WriteLock.sync.
>>>                 exclusiveOwnerThread field. Looking in the
>>> java.util.concurrent
>>>                 source
>>>                 code, it seems that this field only gets null'ed when
>>> the lock
>>>                 is released.
>>>
>>>                 However, looking in the heap dump taken following our
>>>                 "deadlock", we were
>>>                 surprised to find that the lock in question has a null
>>>                 sync.exclusiveOwnerThread field.
>>>
>>>                 Surely a write lock should be in one of two states
>>> (except
>>>                 possibly for a
>>>                 tiny instant when its state is being non-atomically
>>> switched):
>>>
>>>                 1) The lock is available, and sync.exclusiveOwnerThread
>>> is null
>>>                 2) The
>>>                 lock is unavailable, and sync.exclusiveOwnerThread is
>>> populated
>>>
>>>                 But our lock was indefinitely in this state:
>>>
>>>                 3) The lock is unavailable and sync.exclusiveOwnerThread
>>> is null
>>>
>>>                 Does anyone know whether this represents a bug? If not,
>>> can you
>>>                 explain
>>>                 what it means for a lock to be in this counterintuitive
>>> state?
>>>
>>>                 Thanks, Phil
>>>
>>>                 ______________________________**___________________
>>>                 Concurrency-interest mailing list
>>>                 Concurrency-interest at cs.__oswe**go.edu<http://oswego.edu>
>>>                 <mailto:Concurrency-interest@**cs.oswego.edu<Concurrency-interest at cs.oswego.edu>
>>> >
>>>                 http://cs.oswego.edu/mailman/_**
>>> _listinfo/concurrency-interest<http://cs.oswego.edu/mailman/__listinfo/concurrency-interest>
>>>                 <http://cs.oswego.edu/mailman/**
>>> listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>> >
>>>
>>>
>>>
>>>
>>>
>>>                 --
>>>                 Viktor Klang
>>>
>>>                 Akka Tech Lead
>>>                 Typesafe <http://www.typesafe.com/> - The software
>>> stack for
>>>                 applications
>>>                 that scale
>>>
>>>                 Twitter: @viktorklang
>>>                    *_____________________________**____________________*
>>>
>>>                   Concurrency-interest mailing list
>>>                 Concurrency-interest at cs.__oswe**go.edu<http://oswego.edu>
>>>                 <mailto:Concurrency-interest@**cs.oswego.edu<Concurrency-interest at cs.oswego.edu>
>>> >
>>>                 http://cs.oswego.edu/mailman/_**
>>> _listinfo/concurrency-interest<http://cs.oswego.edu/mailman/__listinfo/concurrency-interest>
>>>                 <http://cs.oswego.edu/mailman/**
>>> listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>> >
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>         ______________________________**___________________
>>>         Concurrency-interest mailing list
>>>         Concurrency-interest at cs.__oswe**go.edu <http://oswego.edu>
>>>         <mailto:Concurrency-interest@**cs.oswego.edu<Concurrency-interest at cs.oswego.edu>
>>> >
>>>         http://cs.oswego.edu/mailman/_**_listinfo/concurrency-interest<http://cs.oswego.edu/mailman/__listinfo/concurrency-interest>
>>>         <http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>> >
>>>
>>>
>>>
>>>
>>> ______________________________**_________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.**oswego.edu<Concurrency-interest at cs.oswego.edu>
>>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>>
>>
>> ______________________________**_________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.**oswego.edu <Concurrency-interest at cs.oswego.edu>
>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>
>>  ______________________________**_________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.**oswego.edu <Concurrency-interest at cs.oswego.edu>
> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>



-- 
Viktor Klang

Akka Tech Lead
Typesafe <http://www.typesafe.com/> - The software stack for applications
that scale

Twitter: @viktorklang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120802/53169153/attachment-0001.html>


More information about the Concurrency-interest mailing list