[concurrency-interest] jsr166y.forkjoin API comments
jdmarshall at gmail.com
Fri Jan 25 20:04:20 EST 2008
Now that I've stirred the hornet's nest, let me ask you all a higher level
Do you think the serious design wins for ForkJoin are going to come from
from parallelizing tasks that only take a few instruction cycles, or do you
think ForkJoin's biggest win will be with tasks that take thousands or
millions of cycles?
Most of the posts in this thread are dealing with things that happen near
the noise floor, where the cost of a single address lookup are as big as the
function you plan to execute over the value (ie, basic arithmetic). If I'm
looking for prime numbers, then building it out of Integers or ints probably
isn't going to matter very much in the overall timeslice. If I'm looking
for the min element, sure, it'll be faster. But how often do I really need
to have automated parallelization of min(int)?
My understanding is that, if you really want to make a function over
primitives go very, very fast, you're going to convert the code to SIMD
instructions on the processor, on in the GPU. Therefore, the Big Win here
is not in writing an API that tries to make the Java code run as fast as
possible, but instead figuring out how to get Hotspot to turn your code into
SIMD calls for you.
Can you do that with Integer or Double? Maybe, maybe not. But until
Hotspot does that with ints, and float, then it hardly matters, does it?
On Jan 25, 2008 9:54 AM, Doug Lea <dl at cs.oswego.edu> wrote:
> jason marshall wrote:
> > Also, as a tangent back to my motivation for subscribing: I don't agree
> > that a 2x increase in performance (no actual numbers were cited, just
> > 'noticeably faster') is worth all of this extra code.
> Often 4X and up to 16X in some cases, which is enough to
> outweigh benefits of parallelism on the majority of platforms (i.e.,
> less than 16-ways) it will run on for the next few years at least.
> Yes, it is ugly. Please help us find ways to overcome the past
> decade's lack of attention to properly integrating scalars.
> > Haven't we
> > learned by now not to fight the JVM? Wouldn't a better solution be to
> > figure out how we can -help- the JVM optimize the code instead of trying
> > to do the job for it?
> Maybe so, but the only way to put pressure on language and JVM folks
> to take on this unglamorous topic is to actually make something that
> runs well but is awkward to use, and in some ways ugly even to look at
> (but we are trying hard on that front).
> > And aren't the Number Objects allocated for the
> > Object array likely to have locality of reference anyway, because they
> > will have temporal locality (be allocated sequentially)?
> That's only the tip of the iceberg. Other issues include
> (1) It adds significant GC overhead clearing large arrays of refs.
> (2) card-marking GC (google it) creates serious scalability
> issues because of card table write contention (this is often the
> biggest performance issue). (3) It increases method
> call chain length to access actual values, which interacts poorly
> with JVM inlining rules. (4) When combined with function interfaces (like
> the ones in Ops) it offers few opportunities to bypass boxing.
> (5) The consequent inability to open up loop bodies means that few
> classic loop optimizations (hoisting etc) that you take for granted
> in plain loops can be applied.
> Across each of these aspects, there's a "boxing is not so bad" mentality
> that justifies lack of attention. But ParallelArray entails a perfect
> storm combining them all to lead to serious performance issues.
> If we released something that wasted all 16 of your new multicore/MP
> CPUs to almost obtain simple sequential performance, there wouldn't
> be much reason to use it. But we can, with some loss in uglification,
> do a lot better. So we do.
> If you or anyone would like to work on language, compiler, and
> JVM techniques that improve this state of affairs, the world
> will thank you when you make the specialized APIs irrelevant
> so we can deprecate them.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest