[concurrency-interest] Improved FJ thread throttling

Doug Lea dl at cs.oswego.edu
Tue Jul 8 13:00:38 EDT 2014

ForkJoin extensions and adaptations for JDK8 (streams etc) included
overly course-grained thread throttling. This was on the to-do-list
for a while. A new update addresses this.

Context: By design, ForkJoinPool relies on only a single parameter,
target parallelism. It takes all responsibility for ensuring that the
"right" number of threads are running at any given time for
data-parallel and async applications.  It is impossible to even define
what the "right number" is, so it is impossible for us or anyone else
to get this exactly right.  (This is different than for example
setting up N services using a newFixedThreadPool(N), where the only
right answer is N.)  One approach to dealing with this would be to
introduce a zillion controls that would be even harder to use and
prone to even more policy inconsistency and context-dependence
problems than seen with ThreadPoolExecutor. This would be a throwback
to the days when every efficient parallel program had to be custom
built. Some people think that people should still write parallel
programs this way (please feel free to do so.)  FJ instead implements
portable algorithms and internal policies that are rarely optimal for
any given platform and application but often close to optimal.  As FJ
is used for increasingly diverse purposes, getting thread throttling
approximately "right" in all cases gets more challenging. But we like

Aside: The situation is very similar to that for ConcurrentHashMap,
that also only accepts only one optional parameter (capacity). If you
have special requirements, you may be able to create a custom map that
outperforms CHM. But over time, CHM evolves to benefit from diverse
usage experiences, so customization becomes less likely to be

More background: Any given parallel computation (including one just
using FJ for asyncs) might, for good performance, need fewer than the
target threads, the same number, or, if some dependent computations
block waiting for others, possibly more threads to compensate for the
blocked ones. (See below about blocking for other reasons.)  The
"more" case is now less common than before, but you can't ignore it
without risk of locking up computations.  And in some cases of mixed
parallel/clustered systems, this could lead to distributed deadlock.
(Note: The number of spare threads needed has little to do with the
target parallelism level, but instead the form of the parallel
computation dag.)  So, even though creating more than a dozen spare
threads is rare, FJ itself imposed only a ceiling (32K threads) that
is so high that programs typically die for other reasons before
reaching it; and documents only an intentionally vague implementation
note that "This implementation rejects submitted tasks (that is, by
throwing RejectedExecutionException) only when the pool is shut down
or internal resources have been exhausted."

One disadvantage of this policy is that the ceiling is so high that
programming mistakes (for example those with infinitely nested joins)
or intentional abuses are usually not caught in a very nice way. For
example, on Unix-based systems, people might encounter "No more
processes" just trying to kill the program.  Especially when
implementing the JDK8 Common Pool, we should have dropped this limit
from being a thousand times larger than expected under normal use down
to a value that far exceeds that needed in any practical program, but
still gives JVMs a chance to recover. So the update includes an
absolute ceiling of 256 more threads than the target parallelism (or
the original total of 32K, whichever is lower.)  This will not impact
any current practical programs except those that by chance never ran
long enough to hit higher limits.  The value 256 is somewhat
arbitrary. It's the highest value for which any multicore JVM is
expected to always have enough resources to recover from.  By choosing
a conservatively high value, there is good justification for including
this in a JDK8 update and additionally being more aggressive about
killing off spare threads.  To further limit behavior, we still also
allow users to supply ThreadFactories that throw exceptions after
hitting some maximum, but still don't particularly recommend use. Most
implementations of external limits are arbitrarily imprecise in part
because they cannot tell when threads are really gone: decrementing a
count does not necessarily mean that the thread has stopped or its
resources have been recovered.  (The internal bounds have some of the
same problems, but handle them conservatively.)

But the main story with this update is improved internal tracking that
is usually much closer to running the right number of threads than
before.  (This version also includes more and better internal
documentation and refactorings that take advantage of JVM improvements
that have occurred since JDK6, for example being much better than
before at compiling 64bit logical operations without needing to code
by splitting into 32bit parts.)

The only version available is in our jsr166 main (JDK8/JDK9 only)
repository, with the aim of having some of you try it out before
considering integration into OpenJDK.  To use it, you can either use
the jar at http://gee.cs.oswego.edu/dl/concurrent/dist/jsr166.jar and
run with -Xbootclasspath/p:jsr166.jar; or copy into an OpenJDK and
build files:


Also, a few notes about blocking threads in any context (in FJ, other
Executors, even the JVM itself).  Whenever there is some bound on
thread construction, and threads start blocking, eventually programs
will freeze or throw exceptions.  We can't/won't forbid all blocking
because it is often harmlessly transient.  On the other hand, most
programs deal with saturation effects of long-term blocking about as
well as they deal with other resource failures (out of memory etc),
which is not very well. But coping mechanisms do always exist.  FJ
provides a thread-vs-memory tradeoff hook via ManagedBlocker: If a
task blocks but you want to ensure liveness for processing other tasks
use a ManagedBlocker. If you are content to let other work pile up
unless/until blocked threads resume, don't use it.  This is not always
an easy decision to make, but cannot be automated because the number
of ways/reasons that tasks may block is unbounded.  Note:
ThreadPoolExecutor cannot use this approach, so instead supports
RejectedExecutionHandlers for use in similar situations.


More information about the Concurrency-interest mailing list