[concurrency-interest] ForkJoinPool not designed for nested Java 8 streams.parallel().forEach( ... )

Christian Fries email at christian-fries.de
Tue May 6 04:09:44 EDT 2014

Dear All.

Thank you for your replies.

At Stackoverflow people immediately reacted to the use of the semaphore as an obvious problem in my code. They are correct! However, I have the impression that the problem in the FJP is there and not related to this. So I created an example without that semaphore, you find it at:
and I appended it below. Given that I would rephrase the problem as an unexpected performance issue.

Let me describe the setup:

We have a nested stream.parallel().forEach(). The inner loop is independent (stateless, no interference, etc. - except of the use of a common pool) and consumes 1 second in total in the worst case, namely if processed sequential. Half of the tasks of the outer loop consume 10 seconds prior that loop. Half consume 10 seconds after that loop. We have a boolean which allows to switch the inner loop from prallel() to sequential(). Hence every thread consumes 11 seconds (worst case) in total. Now: submitting 24 outer-loop-tasks to a pool of 8 we would expect 24/8 * 11 = 33 seconds at best (on an 8 core or better machine).

The result is:
- With inner loop sequential:	33 seconds.
- With inner loop parallel:		>80 seconds (I had 93 seconds).

Can you confirm this behavior on your machine? Mine is a Mid 2012 MBP 2.6 i7.

Darwin Vesper.local 13.1.0 Darwin Kernel Version 13.1.0: Wed Apr  2 23:52:02 PDT 2014; root:xnu-2422.92.1~2/RELEASE_X86_64 x86_64
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

@Alexey: I believe the problem is induced by the awaitJoin called on the wrong queue due to that test I mentioned. This introduced a coupling where inner tasks wait on outer task. I have a workaround where you can nest parallel loops to the same cp and the problem goes away (wrap the inner loop in its own thread).


 — code starts here — 

public class NestedParallelForEachTest {

	// The program uses 33 sec with this boolean to false and around 80+ with this boolean true:
	final boolean isInnerStreamParallel		= true;

	// Setup: Inner loop task 0.01 sec in worse case. Outer loop task: 10 sec + inner loop. This setup: (100 * 0.01 sec + 10 sec) * 20 = 22 sec.
	final int		numberOfTasksInOuterLoop = 24;				// In real applications this can be a large number (e.g. > 1000).
	final int		numberOfTasksInInnerLoop = 100;				// In real applications this can be a large number (e.g. > 1000).
	final int		concurrentExecutionsLimitForStreams	= 8;	// java.util.concurrent.ForkJoinPool.common.parallelism
	public static void main(String[] args) {
		(new NestedParallelForEachTest()).testNestedLoops();

	public void testNestedLoops() {

		long start = System.currentTimeMillis();

		System.out.println("java.util.concurrent.ForkJoinPool.common.parallelism = " + System.getProperty("java.util.concurrent.ForkJoinPool.common.parallelism"));

		// Outer loop
		IntStream.range(0,numberOfTasksInOuterLoop).parallel().forEach(i -> {

			try {
				if(i < 10) Thread.sleep(10 * 1000);

				System.out.println(i + "\t" + Thread.currentThread());

				if(isInnerStreamParallel) {
					// Inner loop as parallel: worst case (sequential) it takes 10 * numberOfTasksInInnerLoop millis
					IntStream.range(0,numberOfTasksInInnerLoop).parallel().forEach(j -> {
						try { Thread.sleep(10); } catch (Exception e) { e.printStackTrace(); }
				else {
					// Inner loop as sequential
					IntStream.range(0,numberOfTasksInInnerLoop).sequential().forEach(j -> {
						try { Thread.sleep(10); } catch (Exception e) { e.printStackTrace(); }

				if(i >= 10) Thread.sleep(10 * 1000);
			} catch (Exception e) { e.printStackTrace(); }


		long end = System.currentTimeMillis();

		System.out.println("Done in " + (end-start)/1000 + " sec.");

Am 06.05.2014 um 01:09 schrieb Doug Lea <dl at cs.oswego.edu>:

> On 05/05/2014 03:58 PM, Christian Fries wrote:
>> I am new to this list, so please excuse, if this is not the place for the
>> following topic (I did a bug report to Oracle and also posted this to
>> Stackoverflow at http://stackoverflow.com/questions/23442183 - and it was
>> suggested (at SO) that I post this here too).
> Sorry that you and a bunch of people at stackOverflow spent so much time
> on this because you did not did not notice the java.util.stream documentation
> (http://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html)
> about the use of independent functions/actions (especially not those including
> synchronization). We should do something about this.
> Thanks to David and Aleksey for explaining that the problem has nothing
> to do with nested parallelism (except that in this case it triggered a
> problem that otherwise happened not to occur).
> I'm wondering if we should retrofit ManagedBlocker mechanics to all
> j.u.c blocking synchronizers to avoid future problem reports. The
> results of using streams/FJ in such cases will not always be very
> sensible, but at least they won't starve. This would cover pretty
> much all possibly user synchronization except for manual wait/notify
> monitor constructions.
> (And if we did this for IO as well, we'd have an implicit answer to
> Joel Richards's queries...)
> -Doug
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20140506/3f51e7c1/attachment-0001.html>

More information about the Concurrency-interest mailing list