[concurrency-interest] Matrix multiply with parallelized inner product

Hanson Char hanson.char at gmail.com
Mon Feb 4 01:01:08 EST 2008

Hi Tim,

In the wiki example "Matrix multiply with parallelized inner product"


"It is much, much slower than the version that just parallelizes the
outer loop."

Did you know this as a fact prior to benchmarking ?  Does this mean
too much parallelism via PA would result in slower performance ?  If
so, any guideline/recipe as to what extent should one go about using
PA without causing such slowdown (besides trial-and-error) ?


