[concurrency-interest] Matrix multiply with parallelized inner product

jason marshall jdmarshall at gmail.com
Tue Feb 5 10:46:03 EST 2008


I'm curious about the magic number of 8 ( or "p<<3")

I understand that you're trying to accommodate asymmetries in the work load,
to reduce the latency at the tail end (caused unequal units of work, unequal
CPU loading), but my understanding of partitioning was that if multiple
tasks are queued, then you need (or even tolerate) less decomposition to
achieve throughput.

So two things spring to mind.  One, if you're using an executor for one-off
or infrequent loading cycles, then you might want a different magic number
than if new work shows up constantly, and 2, recursive divide-and-conquer is
probably never going to give someone what they expected, unless the magic
number was chosen too conservatively.

-Jason


On Feb 4, 2008 5:07 PM, Doug Lea <dl at cs.oswego.edu> wrote:

> Tim Peierls wrote:
> > On Feb 4, 2008 1:16 PM, Joe Bowbeer <joe.bowbeer at gmail.com
> > <mailto:joe.bowbeer at gmail.com>> wrote:
> >
> >     I suspect that nesting PAs would defeat its heuristics for
> >     partitioning the computation.
> >
> >
> > PAs run on ForkJoinExecutors that might be doing other things, why not
> > nested things?
> >
>
> This is usually the best attitude.
>
> As Joe noticed, the main FJ techniques used in PA split problems
> into many  more tasks than you have processors, and rely on
> the underlying mechanics to load-balance etc. (So in this sense,
> worker threads are always "doing other things".)
> And normally, this works about equally well regardless of
> nesting. (The "about" here is shorthand for a long story with
> little practical impact in most cases.)
>
> However, it is often the case that you can replace nested parallelism
> with more efficient non-nested algorithms. (Matrix multiply
> happens to be a fairly well known example of this; see the old FJTask
> version I mentioned.) To implement this here, you
> would need to move from PA-level to FJ-level programming. Which I
> think might be a natural progression for people trying to fine-tune
> performance. ParallelArray does most common things more efficiently
> than most people could do themselves using ForkJoinTasks, and does
> so without forcing them to learn how to do parallel recursive
> decomposition. For those things PA does not do, we provide the
> underlying FJ tools. (OK, so they are underdocumented tools
> at the moment,  but hopefully that will improve.)
>
> As someone who has written a lot of fork-join programs over the past
> decade, I think they are fun and natural to program.
> But even I prefer using PA for common apply-to-all constructions
> and the like rather than manually setting up little recursive
> tasks to do it.
>
> -Doug
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at altair.cs.oswego.edu
> http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
>



-- 
- Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: /pipermail/attachments/20080205/482c08f2/attachment.html 


More information about the Concurrency-interest mailing list