[concurrency-interest] fork-join usage questions

Alex Miller alexdmiller at yahoo.com
Wed Dec 29 20:12:42 EST 2010

I'm looking for some guidance on a processing system where I'm considering the 
use of fork/join.  Processing involves executing a compute-only tree (a DAG) of 
computation over a base set of data that exists at the leaf nodes.  I'd like to 
be able to execute independent nodes in parallel and also use parallelism to 
execute data within individual nodes. I think all of that seems like a good 
match for fork/join.

The kicker is that the base data has been requested and is arriving 
asynchronously.  I don't want to wait until all data is retrieved to begin 
processing; I'd like to process portions of the tree on the data that has 
arrived as much as possible.  Clearly waiting for async I/O is not a good match 
for fork/join.

I've been contemplating a system that would watch for the data to arrive and as 
a usefully large chunk accumulates it would trigger a top-level FJtask, which 
would fork through it's computation tree as necessary.  These tasks would either 
bottom out at data and process it or need more data from a branch and bail out. 
The process would restart when more data arrives.  Eventually, all data would 
arrive, all computation would complete, and the results would be complete.    

I've built something similar to this in the past using an Executor style system 
(this was before java.util.concurrent existed but was very similar in using a 
worker thread pool and an execution queue).  I'm looking to build a 
high-throughput system (more so than low latency) and one that can fully 
leverage many-core systems.

My questions:
1) Is the overhead of restarting the task execution for each and all that so 
high that I will swamp any potential benefits from using FJ over a classic 
worker/queue style system?  Assuming I have many of these in-flight at once, I'm 
assuming the threads are always busy and there is presumably lots of contention 
on a single worker queue.
2) Are there issues with running potentially many such *independent* computation 
trees in a FJ pool?  
3) Is there some way to more cleanly deal with the asynchronous waits in a 
custom FJ task?
4) Is there a much different paradigm I should be looking at?  I've hacked 
around a bit on both continuation and actor style systems for this but it seems 
like they might have too much overhead.  

Any suggestions appreciated....
Alex Miller
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20101229/7dbc4e99/attachment.html>

More information about the Concurrency-interest mailing list