[concurrency-interest] Use-cases and issues with the ForkJoinPool

Romain Colle rco at quartetfs.com
Fri Jan 13 05:06:59 EST 2012

Hi Doug & everyone,

With the current improvements being done to the ForkJoinPool,
I wanted to share some of the "stories" and concerns we had
lately with it.

Our company uses the ForkJoinPool a lot in our flagship product,
a main-memory aggregation engine.
To be fair, this is the only kind of thread pool that we actually
use for the most important parts.
Our typical usage is that when a new request comes in, we wrap it
up in a task, submit it in the pool and wait for it to be done.
Therefore, most of our code runs within the pool and heavily uses
its fork/join abilities since our software primarily relies on
massively parallel computations.

We had two main issues that we faced with this approach.

Fork/Join out of order
A request submitted by a user is essentially a query that is being
split into multiple subqueries. These subqueries can themselves be
split, so on and so forth ...
The additional twist here is that some of these subqueries need to
wait for the result of some other subqueries before they can run
(we are guaranteed to be cycle-free). Therefore, they simply join
the associated ForkJoinTask and we rely on the work-stealing
capabilities of the pool to use the waiting thread to participate
in the joined task's computation.

This joined task can basically be found in 3 different places:
 1) This is the last task forked by the current thread.
    In this case the current thread simply deques and executes
 2) It has been stolen by another thread that was looking for work
    and is currently executing somewhere else. The current thread
    will try to find where and help.
 3) It is in the current thread's queue, but not on top. In this
    case, the current thread cannot do anything and will have to
    wait for another thread in the pool to steal this task, execute
    it and signal it.
    It will also mark itself as non-active, but this will only
    trigger the creation of a new "compensation" thread if NO threads
    are active anymore, which is not a usual case in a steady state.

As you might have guessed, our problem is case #3.
We find this to happen a lot, especially under heavy load with several
computation-heavy requests having to be served concurrently.
This, of course, decreases the computation concurrency a lot, reverting
to a monothreaded computation at times on a 16 cores machine (which kind
of defeats the whole purpose of multithreading!).

We managed to solve the problem in most cases by adding a planning step
to our queries and creating a DAG of dependencies, then by forking and
joining the tasks in the "correct" order (a.k.a. invokeAll).

However, there are still some places and cases where we cannot do that
and have to take the performance hit.

This problem happens because of the previous use-case (out of order
fork/join) and the fact that, inevitably, some of our forked tasks need to
take locks to execute.
The reason why this is is that, as mentioned before, everything runs in
the pool, including callbacks to our users' code. And more often than not,
it is "synchronized" or makes use of locks.

A simple case where we can see a deadlock is the following.
Consider a ForkJoinPool of size 2, two ForkJoinTasks T1 and T2 and a lock L.
The tasks' compute method does the following:
 - lock L
 - fork 2 subtasks Ta and Tb (in that order)
 - join Ta
 - join Tb
 - unlock L

If T1 and T2 are both running in one of the pool's worker, only one of them
can get the lock L. The other thread will be blocked waiting for L, but will
still be considered active by the FJPool (since this is not a "managed"
The thread that got the lock will fork Ta and Tb, then join Ta and find
in case #3 described in the section above.
It will therefore wait for another thread in the pool to execute Ta for it,
none are available, leading to a deadlock.
No compensation threads will be created since there is one thread
active in the pool (even though it is actually waiting for a lock).

This is a very simple example, but it illustrates the problem we are facing.

In this case, we solved the issue by implementing an execJoin() method on
special kind of tasks: execJoin() executes the task if this is the first
this method is called on this task, or joins it otherwise.
This requires an extra field (I didn't want to hijack that ForkJoinTask's
"status" field) and a CAS.

The more general issue in both cases is that a worker thread can only
execute a
task that is at the top of its queue on a join().
What is the reason why localHelpJoinTask() cannot find a task anywhere in
local queue, CAS its slot to null in the queue and execute it? From what I
null slots are well-handled throughout the code.

Anyway, thanks a lot for all the pool maintenance and improvements, this is
great piece of software that we are very thankful for!


Romain Colle
Senior R&D Engineer
2 rue Jean Lantier, 75001 Paris, France
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120113/b340cb9e/attachment-0001.html>

More information about the Concurrency-interest mailing list