[concurrency-interest] jsr166y.forkjoin API comments

Doug Lea dl at cs.oswego.edu
Fri Jan 25 12:54:21 EST 2008

jason marshall wrote:

> Also, as a tangent back to my motivation for subscribing:  I don't agree 
> that a 2x increase in performance (no actual numbers were cited, just 
> 'noticeably faster') is worth all of this extra code. 

Often 4X and up to 16X in some cases, which is enough to
outweigh benefits of parallelism on the majority of platforms (i.e.,
less than 16-ways) it will run on for the next few years at least.

Yes, it is ugly. Please help us find ways to overcome the past
decade's lack of attention to properly integrating scalars.

> Haven't we 
> learned by now not to fight the JVM?  Wouldn't a better solution be to 
> figure out how we can -help- the JVM optimize the code instead of trying 
> to do the job for it? 

Maybe so, but the only way to put pressure on language and JVM folks
to take on this unglamorous topic is to actually make something that
runs well but is awkward to use, and in some ways ugly even to look at
(but we are trying hard on that front).

> And aren't the Number Objects allocated for the 
> Object array likely to have locality of reference anyway, because they 
> will have temporal locality (be allocated sequentially)?

That's only the tip of the iceberg. Other issues include
(1) It adds significant GC overhead clearing large arrays of refs.
(2) card-marking GC (google it) creates serious scalability
issues because of card table write contention (this is often the
biggest performance issue).  (3) It increases method
call chain length to access actual values, which interacts poorly
with JVM inlining rules. (4) When combined with function interfaces (like
the ones in Ops) it offers few opportunities to bypass boxing.
(5) The consequent inability to open up loop bodies means that few
classic loop optimizations (hoisting etc) that you take for granted
in plain loops can be applied.

Across each of these aspects, there's a "boxing is not so bad" mentality
that justifies lack of attention. But ParallelArray entails a perfect
storm combining them all to lead to serious performance issues.
If we released something that wasted all 16 of your new multicore/MP
CPUs to almost obtain simple sequential performance, there wouldn't
be much reason to use it. But we can, with some loss in uglification,
do a lot better. So we do.

If you or anyone would like to work on language, compiler, and
JVM techniques that improve this state of affairs, the world
will thank you when you make the specialized APIs irrelevant
so we can deprecate them.


More information about the Concurrency-interest mailing list