[concurrency-interest] fork-join usage questions
alexdmiller at yahoo.com
Wed Dec 29 20:12:42 EST 2010
I'm looking for some guidance on a processing system where I'm considering the
use of fork/join. Processing involves executing a compute-only tree (a DAG) of
computation over a base set of data that exists at the leaf nodes. I'd like to
be able to execute independent nodes in parallel and also use parallelism to
execute data within individual nodes. I think all of that seems like a good
match for fork/join.
The kicker is that the base data has been requested and is arriving
asynchronously. I don't want to wait until all data is retrieved to begin
processing; I'd like to process portions of the tree on the data that has
arrived as much as possible. Clearly waiting for async I/O is not a good match
I've been contemplating a system that would watch for the data to arrive and as
a usefully large chunk accumulates it would trigger a top-level FJtask, which
would fork through it's computation tree as necessary. These tasks would either
bottom out at data and process it or need more data from a branch and bail out.
The process would restart when more data arrives. Eventually, all data would
arrive, all computation would complete, and the results would be complete.
I've built something similar to this in the past using an Executor style system
(this was before java.util.concurrent existed but was very similar in using a
worker thread pool and an execution queue). I'm looking to build a
high-throughput system (more so than low latency) and one that can fully
leverage many-core systems.
1) Is the overhead of restarting the task execution for each and all that so
high that I will swamp any potential benefits from using FJ over a classic
worker/queue style system? Assuming I have many of these in-flight at once, I'm
assuming the threads are always busy and there is presumably lots of contention
on a single worker queue.
2) Are there issues with running potentially many such *independent* computation
trees in a FJ pool?
3) Is there some way to more cleanly deal with the asynchronous waits in a
custom FJ task?
4) Is there a much different paradigm I should be looking at? I've hacked
around a bit on both continuation and actor style systems for this but it seems
like they might have too much overhead.
Any suggestions appreciated....
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest