[concurrency-interest] Matrix multiply with parallelized inner product

Joe Bowbeer joe.bowbeer at gmail.com
Mon Feb 4 17:43:19 EST 2008


On Feb 4, 2008 2:14 PM, Tim Peierls wrote:
> On Feb 4, 2008 1:16 PM, Joe Bowbeer wrote:
>
> > I suspect that nesting PAs would defeat its heuristics for partitioning
> > the computation.
>
> PAs run on ForkJoinExecutors that might be doing other things, why not
> nested things?
>

That's what I'm wondering.

PAs slice the array into chunks of size

  threshold = (p > 1) ? (1 + n / (p << 3)) : n;

where p is roughly the number of available processors, and n is the
size of the array.

If there's only one processor available, there's only one slice,
otherwise there are about nprocs * 8 slices (i.e., tasks).

If you have a bunch of PAs using the same executor, I think the number
of available processors is effectively reduced, and therefore the
effectiveness of this heuristic is reduced.  It would be better to
have each thread operate on larger chunks than to divide the work up
into more tasks that will only have to wait for threads to become
available.

--Joe


More information about the Concurrency-interest mailing list