[concurrency-interest] Use-cases and issues with the ForkJoinPool
dl at cs.oswego.edu
Fri Jan 13 11:37:01 EST 2012
On 01/13/12 05:06, Romain Colle wrote:
> Hi Doug & everyone,
> With the current improvements being done to the ForkJoinPool,
> I wanted to share some of the "stories" and concerns we had
> lately with it.
Thanks! This is the perfect time for everyone to post experiences
and suggestions about FJ. The most surprising thing we've found is that
most of the heaviest uses don't fall into classic parallel
computation categories, but instead are the results of people finding
that the basic decentralized task framework (work-stealing etc)
is so much faster than alternatives on larger systems that they use
it for all sorts of other things.
We'd like to better support these other things.
Some questions and answers, out of order...
> Consider a ForkJoinPool of size 2, two ForkJoinTasks T1 and T2 and a lock L.
> The tasks' compute method does the following:
> - lock L
> - fork 2 subtasks Ta and Tb (in that order)
> - join Ta
> - join Tb
> - unlock L
> What is the reason why localHelpJoinTask() cannot find a task anywhere in the
> local queue, CAS its slot to null in the queue and execute it? From what I saw
> null slots are well-handled throughout the code.
The ultimate reason is that if someone says ta.join(); tb.join(); we
are not in general allowed to reorder them. However, when
the joins are dependent (as in your lock example), callers
don't really mean that we must join in that order.
Unfortunately, we have no way of letting them tell us this in
the case of joins. We do for invoke, so in the above example, the best
usage would be:
One possibility is to add method
The main trouble is that people would need to learn to use it.
Already we see too many cases of people naively doing:
ta.fork(); tb.fork(); ta.join(); tb.join();
and then seeing it "work" but with a huge blowup of compensation
threads and crummy performance when it is recursive.
One way to address this is to use a "claim" bit on
every task (somewhat similar to your execJoin workaround).
This would cost one more atomic operation on every task
execution but would eliminate a set of misusage problems
(among other things it would allow your suggested
I've contemplated this many times over the years, without
being able to convince myself that it was worthwhile
to penalize all the well-structured FJ usages for the
sake of improving robustness in the face of more
questionable usages. But it is worth some exploration to see
if this can be done with small enough impact to adopt.
> Fork/Join out of order
> A request submitted by a user is essentially a query that is being
> split into multiple subqueries. These subqueries can themselves be
> split, so on and so forth ...
> The additional twist here is that some of these subqueries need to
> wait for the result of some other subqueries before they can run
> (we are guaranteed to be cycle-free). Therefore, they simply join
> the associated ForkJoinTask and we rely on the work-stealing
> capabilities of the pool to use the waiting thread to participate
> in the joined task's computation.
To clarify: The user is a non-FJ thread and the queries
say q1, q2, are submitted using submit (not invoke), and then joined?
If so, you are right that this can be a problem (in any execution
framework) because the pool doesn't know that q1 must be performed
before q2. (Calling p.invoke(q1); p.invoke(q2) would ensure this.)
Under upcoming more relaxed submission queues, out of order processing
will become even more common.
> 3) It is in the current thread's queue, but not on top. In this
> case, the current thread cannot do anything and will have to
> wait for another thread in the pool to steal this task, execute
> it and signal it.
> It will also mark itself as non-active, but this will only
> trigger the creation of a new "compensation" thread if NO threads
> are active anymore, which is not a usual case in a steady state.
> As you might have guessed, our problem is case #3.
> We find this to happen a lot, especially under heavy load with several
> computation-heavy requests having to be served concurrently.
> This, of course, decreases the computation concurrency a lot, reverting
> to a monothreaded computation at times on a 16 cores machine (which kind
> of defeats the whole purpose of multithreading!).
Yes. Some might recall that we had in earlier versions a
"maintainParallelism" method that was meant for exactly this
situation. It was pulled for the same reason as other dynamic
adjustment methods -- nothing sensible happens if you dynamically
change settings while running. Suggestions about somehow supporting
this again would be welcome.
More information about the Concurrency-interest