[concurrency-interest] fork-join usage questions
gdenys at yahoo.com
Thu Dec 30 05:46:22 EST 2010
FJ is quite suited for tasks that need to wait (join) for results of lower level
tasks. If one of those branch tasks is blocking (waiting for I/O), the FJ
framework should select other tasks that are runnable instead of blocking to
join the waiting branch task.
We use FJ to generate and store image tiles and recombine them to higher levels.
We have had very good experiences with FJ. I notice we join child tasks in the
reverse order of forking them, because we had issues with that at some point,
but that may not be needed anymore.
As for running independent computation in a single FJ pool, that should be no
problem since the recommended practise is to have a single FJ pool execute all
your different kinds of FJ tasks.
From: Alex Miller <alexdmiller at yahoo.com>
To: concurrency-interest at cs.oswego.edu
Sent: Thu, December 30, 2010 2:12:42 AM
Subject: [concurrency-interest] fork-join usage questions
I'm looking for some guidance on a processing system where I'm considering the
use of fork/join. Processing involves executing a compute-only tree (a DAG) of
computation over a base set of data that exists at the leaf nodes. I'd like to
be able to execute independent nodes in parallel and also use parallelism to
execute data within individual nodes. I think all of that seems like a good
match for fork/join.
The kicker is that the base data has been requested and is arriving
asynchronously. I don't want to wait until all data is retrieved to begin
processing; I'd like to process portions of the tree on the data that has
arrived as much as possible. Clearly waiting for async I/O is not a good match
I've been contemplating a system that would watch for the data to arrive and as
a usefully large chunk accumulates it would trigger a top-level FJtask, which
would fork through it's computation tree as necessary. These tasks would either
bottom out at data and process it or need more data from a branch and bail out.
The process would restart when more data arrives. Eventually, all data would
arrive, all computation would complete, and the results would be complete.
I've built something similar to this in the past using an Executor style system
(this was before java.util.concurrent existed but was very similar in using a
worker thread pool and an execution queue). I'm looking to build a
high-throughput system (more so than low latency) and one that can fully
leverage many-core systems.
1) Is the overhead of restarting the task execution for each and all that so
high that I will swamp any potential benefits from using FJ over a classic
worker/queue style system? Assuming I have many of these in-flight at once, I'm
assuming the threads are always busy and there is presumably lots of contention
on a single worker queue.
2) Are there issues with running potentially many such *independent* computation
trees in a FJ pool?
3) Is there some way to more cleanly deal with the asynchronous waits in a
custom FJ task?
4) Is there a much different paradigm I should be looking at? I've hacked
around a bit on both continuation and actor style systems for this but it seems
like they might have too much overhead.
Any suggestions appreciated....
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest