[concurrency-interest] Performance regression in newest (6/20/2013) jsr166 update

Ron Pressler ron.pressler at gmail.com
Sat Jun 22 20:42:32 EDT 2013


A couple of months ago I described the behavior detailed in the email
exchange below. The newest update causes a slowdown of about 25% of this

import java.util.concurrent.TimeUnit;
import jsr166e.ForkJoinPool;
import jsr166e.ForkJoinTask;
import jsr166e.RecursiveAction;

public class FJBenchmark {
    static final int PARALLELISM = 4;
    static final int COUNT = 1000000;
    static ForkJoinPool fjPool = new ForkJoinPool(PARALLELISM,
ForkJoinPool.defaultForkJoinWorkerThreadFactory, null, true);

    public static void main(String[] args) throws Exception {
        for (int i = 0; i < 10; i++)

    static void run(int count) throws Exception {
        RecursiveAction lastTask = new RecursiveAction() {
            protected void compute() {
        final long start = System.nanoTime();
        fjPool.submit(new MyTask(count, lastTask));
        System.out.println("count: " + count + " time: " +
TimeUnit.MILLISECONDS.convert(System.nanoTime() - start,

    static class MyTask extends RecursiveAction {
        final int count;
        final ForkJoinTask lastTask;

        public MyTask(int count, ForkJoinTask lastTask) {
            this.count = count;
            this.lastTask = lastTask;

        protected void compute() {
            if (count > 0)
                new MyTask(count - 1, lastTask).fork();

---------- Forwarded message ----------
From: Doug Lea <dl at cs.oswego.edu>
Date: Mon, Apr 8, 2013 at 8:22 PM
Subject: Re: [concurrency-interest] jsr166e.ForkJoinPool performance issue
To: concurrency-interest at cs.oswego.edu

On 04/08/13 13:08, Ron Pressler wrote:

> Now, we're trying to use FJ for scheduling a message-passing component,
> and I've
> noticed a performance issue that the code below demonstrates. This is a
> degenerate case where only one FJTask is active at a time -- a single task
> is
> submitted, which forks a single task and terminates. The new task
> subsequently
> forks another one and so on.
> When the parallelism level is set to 1, a run completes in about 35ms on my
> machine. When it is set to 4 (I'm running on a 4-core i7, 8 virtual
> cores), that
> duration rises above 120ms. The NetBeans profiler shows the hot-spot to be
> ForkJoinPool.WorkQueue.push(), and, in particular, its call to
> ForkJoinPool.signalWork. The NetBeans profiler provides far-from-definite
> proof,
> but further testing has shown that this may, in fact, be true. In our
> code, when
> some computation takes place in the tasks, NetBeans attributes almost 40%
> of CPU
> time to WorkQueue.push().
> It seems that this should actually be a simple case, with only one non-idle
> Worker continually executing each new task as it forks. However, it seems
> like
> in this case (and unlike with our other FJ uses), the pool is sensitive to
> the
> parallelism level, which could be a problem.
The underlying issue is that the pushing task does not know that
the single worker is already available, so activates another.
(It can take a few dozen nanoseconds for workers to rescan before
idling.) So it is not so much parallelism-level as intrinsic raciness.
This turns out to be a common issue when processing small Streams
in upcoming jdk8 support, so I've been working to improve it.
Stay tuned...


Concurrency-interest mailing list
Concurrency-interest at cs.**oswego.edu <Concurrency-interest at cs.oswego.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130623/6d3348e4/attachment.html>

More information about the Concurrency-interest mailing list