[concurrency-interest] Matrix multiply with parallelized inner product

Doug Lea dl at cs.oswego.edu
Mon Feb 4 20:07:04 EST 2008

Tim Peierls wrote:
> On Feb 4, 2008 1:16 PM, Joe Bowbeer <joe.bowbeer at gmail.com 
> <mailto:joe.bowbeer at gmail.com>> wrote:
>     I suspect that nesting PAs would defeat its heuristics for
>     partitioning the computation.
> PAs run on ForkJoinExecutors that might be doing other things, why not 
> nested things?

This is usually the best attitude.

As Joe noticed, the main FJ techniques used in PA split problems
into many  more tasks than you have processors, and rely on
the underlying mechanics to load-balance etc. (So in this sense,
worker threads are always "doing other things".)
And normally, this works about equally well regardless of
nesting. (The "about" here is shorthand for a long story with
little practical impact in most cases.)

However, it is often the case that you can replace nested parallelism
with more efficient non-nested algorithms. (Matrix multiply
happens to be a fairly well known example of this; see the old FJTask
version I mentioned.) To implement this here, you
would need to move from PA-level to FJ-level programming. Which I
think might be a natural progression for people trying to fine-tune
performance. ParallelArray does most common things more efficiently
than most people could do themselves using ForkJoinTasks, and does
so without forcing them to learn how to do parallel recursive
decomposition. For those things PA does not do, we provide the
underlying FJ tools. (OK, so they are underdocumented tools
at the moment,  but hopefully that will improve.)

As someone who has written a lot of fork-join programs over the past
decade, I think they are fun and natural to program.
But even I prefer using PA for common apply-to-all constructions
and the like rather than manually setting up little recursive
tasks to do it.


More information about the Concurrency-interest mailing list