[concurrency-interest] ThreadPoolExecutorTest occasionally fails with a broken barrier!?

David Holmes dcholmes at optusnet.com.au
Wed Feb 14 17:43:24 EST 2007


I think Peter has hit the nail on the head. If this test is repeated then
the state of the executor threads is unknown when the next round of
submissions occur. It could be that none of the pool threads have gotten
back to polling the queue, in which case the main thread will execute the
barrier await() and timeout.

Oliver: to debug this sort of thing you needed to see why the barrier was
breaking. We suspect timeouts but you need to confirm this. Then you need to
see what iteration (both loop-level and repeated-test level) is failing.
That should show that it doesn't fail on the first test.

As to what happens after a barrier is "opened":

"The barrier is called cyclic because it can be re-used after the waiting
threads are released. "

threads just starting waiting again until the required number of parties
arrive.

You say without the count you "deadlocked" - was that with a timeout
supplied to await()? Without the count you would run into the same problem
within the loop as you now do across tests. After the barrier opens it takes
time for the pool threads to complete their tasks and go back to wait for
the next task. If the main thread completes first then it will execute one
of the next tasks itself and so block in await() until the timeout expires.

BTW: what were you actually trying to test?

Cheers,
David Holmes


> -----Original Message-----
> From: concurrency-interest-bounces at cs.oswego.edu
> [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Peter
> Jones
> Sent: Thursday, 15 February 2007 4:44 AM
> To: Oliver Pfeiffer
> Cc: concurrency-interest at cs.oswego.edu
> Subject: Re: [concurrency-interest] ThreadPoolExecutorTestoccasionally
> fails with a broken barrier!?
>
>
> On Wed, Feb 14, 2007 at 05:46:15PM +0100, Oliver Pfeiffer wrote:
>
> > the pool has a maximum of 16 threads and the test-loop repeats 256 times
> > (16*16) but the barrier is set to 17 (16+1). Thus only the first 16 pool
> > threads and the main thread itself (callers run policy) hit the barrier.
> > After the main thread trips the barrier they should continue
> all at once to
> > finish the remaining loops (without further barrier checking due to the
> > counter check).
> >
> > The test should perform as follows:
> >
> > T.01-T.16 arrive barrier and wait
> > T.main arrives barrier and trips it
> > barrier trips -> T.01-T.16 and T.main continue
> > T.01-T.16 only count the latch down without hitting the barrier (counter
> > check)
> >
> > Thus the major question is still: Why does it occasionally fail? :)
>
> You're reusing the same thread pool for each repeat of the test,
> right?  (It appears to be statically initialized once for the test
> class.)  I suspect that occasionally one (or more) of the pooled tasks
> from a previous test run have not fully completed as far as the thread
> pool is concerned, so during the next test run, the main loop gets
> blocked on a iteration earlier than the 17th, and then a barrier await
> times out because the main loop cannot submit the 17th task, which
> breaks the barrier.
>
> -- Peter
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at altair.cs.oswego.edu
> http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest



More information about the Concurrency-interest mailing list