[concurrency-interest] LinkedBlockingDeque deadlock?

Martin Buchholz martinrb at google.com
Mon Jul 13 18:37:58 EDT 2009


I did some stack trace eyeballing and did a mini-audit of the
LinkedBlockingDeque code, with a view to finding possible bugs,
and came up empty.  Maybe it's a deep bug in hotspot?

Ariel, it would be good if you could get a reproducible test case soonish,
while someone on the planet has the motivation and familiarity to fix it.
In another month I may disavow all knowledge of j.u.c.*Blocking*

Martin


On Wed, Jul 8, 2009 at 15:57, Ariel Weisberg <ariel at weisberg.ws> wrote:

> Hi,
>
> > The poll()ing thread is blocked waiting for the internal lock, but
> > there's
> > no indication of any thread owning that lock. You're using an OpenJDK 6
> > build ... can you try JDK7 ?
>
> I got a chance to do that today. I downloaded JDK 7 from
>
> http://www.java.net/download/jdk7/binaries/jdk-7-ea-bin-b63-linux-x64-02_jul_2009.bin
> and was able to reproduce the problem. I have attached the stack trace
> from running the 1.7 version. It is the same situation as before except
> there are 9 execution sites running on each host. There are no threads
> that are missing or that have been restarted. Foo Network thread
> (selector thread) and Network Thread - 0 are waiting on
> 0x00002aaab43d3b28. I also ran with JDK 7 and 6 and LinkedBlockingQueue
> and was not able to recreate the problem using that structure.
>
> > I don't recall anything similar to this, but I don't know what version
> > that
> > OpenJDK6 build relates to.
>
> The cluster is running on CentOS 5.3.
> >[aweisberg at 3f ~]$ rpm -qi java-1.6.0-openjdk-1.6.0.0-0.30.b09.el5
> >Name        : java-1.6.0-openjdk           Relocations: (not relocatable)
> >Version     : 1.6.0.0                           Vendor: CentOS
> >Release     : 0.30.b09.el5                  Build Date: Tue 07 Apr 2009
> 07:24:52 PM EDT
> >Install Date: Thu 11 Jun 2009 03:27:46 PM EDT      Build Host:
> builder10.centos.org
> >Group       : Development/Languages         Source RPM:
> java-1.6.0-openjdk-1.6.0.0-0.30.b09.el5.src.rpm
> >Size        : 76336266                         License: GPLv2 with
> exceptions
> >Signature   : DSA/SHA1, Wed 08 Apr 2009 07:55:13 AM EDT, Key ID
> a8a447dce8562897
> >URL         : http://icedtea.classpath.org/
> >Summary     : OpenJDK Runtime Environment
> >Description :
> >The OpenJDK runtime environment.
>
> > Make sure you haven't missed any exceptions occurring in other threads.
> There are no threads missing in the application (terminated threads are
> not replaced) and there is a try catch pair (prints error and rethrows)
> around the run loop of each thread. It is possible that an exception may
> have been swallowed up somewhere.
>
> >A small reproducible test case from you would be useful.
> I am working on that. I wrote a test case that mimics the application's
> use of the LBD, but I have not succeeded in reproducing the problem in
> the test case. The app has a single thread (network selector) that polls
> the LBD and several threads (ExecutionSites, and network threads that
> return results from remote ExecutionSites) that offer results into the
> queue. About 120k items will go into/out of the deque each second. In
> the actual app the problem is reproducible but inconsistent. If I run on
> my dual core laptop I can't reproduce it, and it is less likely to occur
> with a small cluster, but with 6 nodes (~560k transactions/sec) the
> problem will usually appear. Sometimes the cluster will run for several
> minutes without issue and other times it will deadlock immediately.
>
> Thanks,
>
> Ariel
>
> On Wed, 08 Jul 2009 05:14 +1000, "Martin Buchholz"
> <martinrb at google.com> wrote:
> >[+core-libs-dev]
> >
> >Doug Lea and I are (slowly) working on a new version of
> LinkedBlockingDeque.
> >I was not aware of a deadlock but can vaguely imagine how it might happen.
> >A small reproducible test case from you would be useful.
> >
> >Unfinished work in progress can be found here:
> >http://cr.openjdk.java.net/~martin/webrevs/openjdk7/BlockingQueue/<http://cr.openjdk.java.net/%7Emartin/webrevs/openjdk7/BlockingQueue/>
> >
> >Martin
>
> On Wed, 08 Jul 2009 05:14 +1000, "David Holmes"
> <davidcholmes at aapt.net.au> wrote:
> >
> > Ariel,
> >
> > The poll()ing thread is blocked waiting for the internal lock, but
> > there's
> > no indication of any thread owning that lock. You're using an OpenJDK 6
> > build ... can you try JDK7 ?
> >
> > I don't recall anything similar to this, but I don't know what version
> > that
> > OpenJDK6 build relates to.
> >
> > Make sure you haven't missed any exceptions occurring in other threads.
> >
> > David Holmes
> >
> > > -----Original Message-----
> > > From: concurrency-interest-bounces at cs.oswego.edu
> > > [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Ariel
> > > Weisberg
> > > Sent: Wednesday, 8 July 2009 8:31 AM
> > > To: concurrency-interest at cs.oswego.edu
> > > Subject: [concurrency-interest] LinkedBlockingDeque deadlock?
> > >
> > >
> > > Hi all,
> > >
> > > I did a search on LinkedBlockingDeque and didn't find anything similar
> > > to what I am seeing. Attached is the stack trace from an application
> > > that is deadlocked with three threads waiting for 0x00002aaab3e91080
> > > (threads "ExecutionSite: 26", "ExecutionSite:27", and "Network
> > > Selector"). The execution sites are attempting to offer results to the
> > > deque and the network thread is trying to poll for them using the
> > > non-blocking version of poll. I am seeing the network thread never
> > > return from poll (straight poll()). Do my eyes deceive me?
> > >
> > > Thanks,
> > >
> > > Ariel Weisberg
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20090713/7db96631/attachment.html>


More information about the Concurrency-interest mailing list