[concurrency-interest] ForkJoin updates

Doug Lea dl at cs.oswego.edu
Wed Jan 25 20:06:18 EST 2012


As promised for a while now, some updates to ForkJoin
are available from the usual places linked from
http://gee.cs.oswego.edu/dl/concurrency-interest/index.html
-- both the java.util.concurrent and jsr166y versions.
I suppose these are targeted for JDK8, but there is no
reason to place these updates only in the jsr166e package.
There are a few very minor further planned improvements, but
it should be ready to use.

Highlights:

1. Substantially better throughput when lots of clients
submit lots of tasks. (I've measured up to 60X speedups
on microbenchmarks). The idea is to treat external submitters
in a similar way as workers -- using randomized queuing and
stealing. (This required a big internal refactoring to
disassociate work queues and workers.) This also greatly
improves throughput when all tasks are async and submitted
to the pool rather than forked, which becomes a reasonable
way to structure actor frameworks, as well as many plain
services that you might otherwise use ThreadPoolExecutor for.

These improvements also lead to a less hostile stance about
submitting possibly-blocking tasks. An added parag in
the ForkJoinTask documentation provides some guidance
(basically: we like them if they are small (even if numerous)
and don't have dependencies).
http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/ForkJoinTask.html

2. More cases are handled that allow threads to help others rather
than generating compensation threads. Including most cases of
naive backward joins. (I'm not sure whether it is a bug or a
feature that some such cases are now only about twice as
slow as structuring joins correctly).

3. One small API addition: Explicit support for task marking.
It was cruel to tell people that they could use FJ for things
like graph traversal but not have a simple way to mark tasks
so they won't be revisited while processing a graph (among a few
other common use cases). Because they weren't supported initially,
marking methods need crummy names that won't conflict with
existing usages: markForkJoinTask and isMarkedForkJoinTask.

4. Better tolerance for GC/allocation stalls: It's not uncommon
for a "lead" task to stall producing subtasks because of GC, causing
others to give and block, requiring expensive unblocking when it
finally resumes. A new slower ramp-down scheme reduces performance
impact. (Although still, the best guidance is to remember Amdahl's
law, and minimize the sequential overhead needed to produce a task).

5. Other minor changes that give a few percent improvement in
common FJ task processing. On the other hand, this version is
even more prone to GC cardmark contention. So if using hotspot
on a multiprocessor (or even >4core multicore) you absolutely
must run in -XX:UseCondCardMark or -XX:+UseG1GC. (Also, it is
better behaved with biased locking disabled -XX:-UseBiasedLocking).

As always, suggestions and comments based on usage experience
would be very welcome.

-Doug




More information about the Concurrency-interest mailing list