[concurrency-interest] forkjoin.ParallelArray and friends
dl at cs.oswego.edu
Tue Aug 28 07:34:28 EDT 2007
Here's a question about another intentional lack of complete
generality we'd like some feedback on:
Background: We supply special scalar-based ParallelDoubleArray,
ParallelLongArray, and ParallelIntArray. Operations
on a ParallelIntArray are in general noticeably faster than
a ParallelArray<Integer> (and similarly for the others).
In fact, the differences between boxed and unboxed array
operations on scalars are usually more extreme for parallel
vs sequential use because of locality issues -- the elements of a
ParallelIntArray are always nearby each other in memory,
but the Integers of a ParallelArray<Integer> may become
scattered across memory, depending on GC/mutation patterns.
GC runtime mechanics usually also cause array element updates
(as in a[i] = x) of references to contend with each other
more than do scalar writes, which can limit scalability.
I suspect that as time goes by, the gap between boxed and
unboxed versions may shrink. But it will still surely be wider
for parallel than sequential processing for the medium-term future.
On the other hand, supporting special scalar forms in
an almost-uniform (*) way requires APIs/code that grow
with the square of the number of forms. So we don't want
to add thousands of lines of special-case code (and provide
lots more WithMapping etc classes to cross-map) to
handle cases that practically never arise.
So the question is: Do you have or know of parallelizable
usages that require support for byte, short, char, float
or boolean arrays? If there's a strong case to be made for
any of them, we should at least consider supporting.
There are a lot of common usages for ints, longs, and doubles,
but I just don't know of any compelling ones for the
others. Even for floats -- in most usages, people would
rather use twice the space, for doubles, to preserve
better accuracy. In fact, even int (vs long) is slightly
questionable for such reasons but still common enough to support.
(*) "almost-uniform" because a few differences between
scalars and objects show up in APIs. For example max()
for objects returns null for empty arrays
(or empty selections) for objects, but Double.MIN_VALUE
for doubles. (It would be possible to throw an exception
instead in both cases to make them uniform, but this
would not interact well with the base-case processing
conventions people normally use.)
More information about the Concurrency-interest