[concurrency-interest] ForkJoin updates
dl at cs.oswego.edu
Wed Jan 25 20:06:18 EST 2012
As promised for a while now, some updates to ForkJoin
are available from the usual places linked from
-- both the java.util.concurrent and jsr166y versions.
I suppose these are targeted for JDK8, but there is no
reason to place these updates only in the jsr166e package.
There are a few very minor further planned improvements, but
it should be ready to use.
1. Substantially better throughput when lots of clients
submit lots of tasks. (I've measured up to 60X speedups
on microbenchmarks). The idea is to treat external submitters
in a similar way as workers -- using randomized queuing and
stealing. (This required a big internal refactoring to
disassociate work queues and workers.) This also greatly
improves throughput when all tasks are async and submitted
to the pool rather than forked, which becomes a reasonable
way to structure actor frameworks, as well as many plain
services that you might otherwise use ThreadPoolExecutor for.
These improvements also lead to a less hostile stance about
submitting possibly-blocking tasks. An added parag in
the ForkJoinTask documentation provides some guidance
(basically: we like them if they are small (even if numerous)
and don't have dependencies).
2. More cases are handled that allow threads to help others rather
than generating compensation threads. Including most cases of
naive backward joins. (I'm not sure whether it is a bug or a
feature that some such cases are now only about twice as
slow as structuring joins correctly).
3. One small API addition: Explicit support for task marking.
It was cruel to tell people that they could use FJ for things
like graph traversal but not have a simple way to mark tasks
so they won't be revisited while processing a graph (among a few
other common use cases). Because they weren't supported initially,
marking methods need crummy names that won't conflict with
existing usages: markForkJoinTask and isMarkedForkJoinTask.
4. Better tolerance for GC/allocation stalls: It's not uncommon
for a "lead" task to stall producing subtasks because of GC, causing
others to give and block, requiring expensive unblocking when it
finally resumes. A new slower ramp-down scheme reduces performance
impact. (Although still, the best guidance is to remember Amdahl's
law, and minimize the sequential overhead needed to produce a task).
5. Other minor changes that give a few percent improvement in
common FJ task processing. On the other hand, this version is
even more prone to GC cardmark contention. So if using hotspot
on a multiprocessor (or even >4core multicore) you absolutely
must run in -XX:UseCondCardMark or -XX:+UseG1GC. (Also, it is
better behaved with biased locking disabled -XX:-UseBiasedLocking).
As always, suggestions and comments based on usage experience
would be very welcome.
More information about the Concurrency-interest