[concurrency-interest] fork-join usage questions

Geert Denys gdenys at yahoo.com
Thu Dec 30 05:46:22 EST 2010

FJ is quite suited for tasks that need to wait (join) for results of lower level 
tasks. If one of those branch tasks is blocking (waiting for I/O), the FJ 
framework should select other tasks that are runnable instead of blocking to 
join the waiting branch task.

We use FJ to generate and store image tiles and recombine them to higher levels. 
We have had very good experiences with FJ. I notice we join child tasks in the 
reverse order of forking them, because we had issues with that at some point, 
but that may not be needed anymore.

As for running independent computation in a single FJ pool, that should be no 
problem since the recommended practise is to have a single FJ pool execute all 
your different kinds of FJ tasks.


From: Alex Miller <alexdmiller at yahoo.com>
To: concurrency-interest at cs.oswego.edu
Sent: Thu, December 30, 2010 2:12:42 AM
Subject: [concurrency-interest] fork-join usage questions

I'm looking for some guidance on a processing system where I'm considering the 
use of fork/join.  Processing involves executing a compute-only tree (a DAG) of 
computation over a base set of data that exists at the leaf nodes.  I'd like to 
be able to execute independent nodes in parallel and also use parallelism to 
execute data within individual nodes. I think all of that seems like a good 
match for fork/join.

The kicker is that the base data has been requested and is arriving 
asynchronously.  I don't want to wait until all data is retrieved to begin 
processing; I'd like to process portions of the tree on the data that has 
arrived as much as possible.  Clearly waiting for async I/O is not a good match 
for  fork/join.

I've been contemplating a system that would watch for the data to arrive and as 
a usefully large chunk accumulates it would trigger a top-level FJtask, which 
would fork through it's computation tree as necessary.  These tasks would either 
bottom out at data and process it or need more data from a branch and bail out. 
The process would restart when more data arrives.  Eventually, all data would 
arrive, all computation would complete, and the results would be complete.    

I've built something similar to this in the past using an Executor style system 
(this was before java.util.concurrent existed but was very similar in using a 
worker thread pool and an execution queue).  I'm looking to build a 
high-throughput system (more so than low latency) and one that can fully 
leverage many-core systems.

My questions:
1) Is the overhead of  restarting the task execution for each and all that so 
high that I will swamp any potential benefits from using FJ over a classic 
worker/queue style system?  Assuming I have many of these in-flight at once, I'm 
assuming the threads are always busy and there is presumably lots of contention 
on a single worker queue.
2) Are there issues with running potentially many such *independent* computation 
trees in a FJ pool?  
3) Is there some way to more cleanly deal with the asynchronous waits in a 
custom FJ task?
4) Is there a much different paradigm I should be looking at?  I've hacked 
around a bit on both continuation and actor style systems for this but it seems 
like they might have too much overhead.  

Any suggestions appreciated....
Alex Miller

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20101230/17ae378c/attachment.html>

More information about the Concurrency-interest mailing list