[concurrency-interest] Unexpected Scalability results in Java Fork-Join (Java 8)

steadria Steven.Adriaensen at vub.ac.be
Tue Jul 28 03:40:41 EDT 2015

Dear all,

Recently, I was running some scalability experiments using Java 
Fork-Join. Here, I used the non-default ForkJoinPool constructor 
`ForkJoinPool(int parallelism)`, passing the desired parallelism level 
(# workers = P) as argument.

Specifically, using the piece of code attached, I got these results on a 
processor with 4 physical and 8 logical cores (Using java 8: 

T1: 11730
T2: 2381 (speedup: 4,93)
T4: 2463 (speedup: 4,76)
T8: 2418 (speedup: 4,85)

While when using java 7 (jre1.7.0), I get

T1: 11938
T2: 11843 (speedup: 1,01)
T4: 5133 (speedup: 2,33)
T8: 2607 (speedup: 4,58)

(where TP is the execution time in ms, using parallelism level P)

While both results surprise me, the latter I can understand (the join 
will cause 1 worker (executing the loop) to block, as it
fails to recognize that it could, while waiting, process other pending 
dummy tasks from its local queue). The former, however, got me puzzled.

Running further experiments on a 64-core SMP machine (jdk1.8.0_45), 
using the JMH benchmarking tool (= 1 fork, 50 iterations (+ 50 warmup))

I got the results below

T1: 23.831

   23.831 ±(99.9%) 0.116 s/op [Average]
   (min, avg, max) = (23.449, 23.831, 24.522), stdev = 0.234
   CI (99.9%): [23.715, 23.947] (assumes normal distribution)

T2: 2.927 (speedup: 8.14)

   2.927 ±(99.9%) 0.091 s/op [Average]
   (min, avg, max) = (2.655, 2.927, 3.405), stdev = 0.184
   CI (99.9%): [2.836, 3.018] (assumes normal distribution)

T64: 1.550 (speedup: 15.37)

   1.550 ±(99.9%) 0.027 s/op [Average]
   (min, avg, max) = (1.460, 1.550, 1.786), stdev = 0.054
   CI (99.9%): [1.523, 1.577] (assumes normal distribution)

My current theory:

I guess one explanation would be that the worker executing the parallel 
loop does not go idle in java 8, but instead finds other work to 
perform. Furthermore, I suspect there might be a 'bug' in this 
mechanism, which causes more workers to be active (i.e. consuming 
resources) than the desired level of parallelism (P) passed as 
constructor argument, explaining the super-linear speedup observed.

I was wondering whether someone of you has a better/other explanation? 
Clearly the use of the java FJ framework in code attached is not 100% 
kosher, however to my knowledge it doesn't violate any of the 
framework's preconditions either?! Note that scalability results are 'as 
expected', when dummy tasks are joined in reverse order.

I really appreciate any help you can provide,

Steven Adriaensen
PhD Student
Vrije Universiteit Brussel
Brussels, Belgium
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MinimalExample.java
Type: text/x-c
Size: 1049 bytes
Desc: not available
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20150728/c728c9d7/attachment.bin>

More information about the Concurrency-interest mailing list