[concurrency-interest] ConcurrentHashMap bulk parallel operations

Doug Lea dl at cs.oswego.edu
Mon Aug 13 12:08:50 EDT 2012


As most of you know, lambda-based bulk operations with optional
parallelism are planned for JDK8. We've been working with JSR335 folks
on these. However, in addition we plan to offer access to bulk
operations designed for use on truly concurrent data structures, in
particular ConcurrentHashMaps. These methods are designed for use on
"live" data, registeries, caches, in-memory data stores, chunks of
Hadoop-style "big data" sets. and so on that might be concurrently
updated during the course of computations. The form and style of this
API are targeted to support operations that can be sensibly applied in
such contexts.  This boils down to only three basic forms: forEach,
search, and reduce (each with multiple variations), expressed in an
imperative style -- no fluency, stateful Streams, etc that are planned
for the JDK8 java.util-based framework. There is no sense in
compromising support for either of these kinds of target usages, so we
won't.  However, the functionality coverage is essentially identical
for those operations that do apply, so we anticipate that the JDK8
java.util-based framework will be able to layer on top of this when
applicable.

The API is in two layers. Nested (static) class "ForkJoinTasks"
returns task objects that, when invoked, provide the given
functionality, but may also be used in other ways. Nested (inner)
class "Parallel" provides an API for using them with a given
ForkJoinPool.  The class-level javadocs for CHM(V8).Parallel are pasted
below.  There will surely be some further API changes in the course of
JDK8 integration. However, in the mean time, we are releasing a
stand-alone form, intended to be usable by both current
ConcurrentHashMapV8 users running JDK7, as well as those experimenting
with current JDK8 preview lambda builds (at
http://jdk8.java.net/lambda/) The current javadocs don't have any
usage examples, because they look vastly different in JDK7 vs JDK8.

Doing this forces a bit of disruption on everyone though.

1. To avoid FJ version mismatches, the current jsr166y FJ classes are
duplicated into jsr166e.

2. To avoid JDK version mismatches, the j.u.c version (plain
"ConcurrentHashMap" without the "V8") is committed in main repository,
while keeping its "V8" in package jsr166e. (This also required an
initial merge of jsr166e.LongAdder and related classes.)

3. To avoid current and future naming problems, a set of function
interfaces are nested within ConcurrentHashMap, with names
intentionally different than those currently used in JDK8 previews
(for example "Action" instead of "Block").  For lambda-enabled
JDK8-preview users, this won't much matter because lambda expressions
will still match as expected.  However, others tediously using this
with emulated-lambdas via static instances of classes implementing the
interfaces will have to bear with future name changes of these
interfaces. This forbearance starts immediately, because the
previously named nested MappingFunction and RemappingFunction are
already changed so as to be applicable across the extended
APIs. Sorry.


... pasting from
http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/jsr166e/ConcurrentHashMapV8.Parallel.html


public class ConcurrentHashMapV8.Parallel


An extended view of a ConcurrentHashMap supporting bulk parallel operations. 
These operations are designed to be be safely, and often sensibly, applied even 
with maps that are being concurrently updated by other threads; for example, 
when computing a snapshot summary of the values in a shared registry. There are 
three kinds of operation, each with four forms, accepting functions with Keys, 
Values, Entries, and (Key, Value) arguments and/or return values. Because the 
elements of a ConcurrentHashMap are not ordered in any particular way, and may 
be processed in different orders in different parallel executions, the 
correctness of supplied functions should not depend on any ordering, or on any 
other objects or values that may transiently change while computation is in 
progress; and except for forEach actions, should ideally be side-effect-free.

     * forEach: Perform a given action on each element. A variant form applies a 
given transformation on each element before performing the action.
     * search: Return the first available non-null result of applying a given 
function on each element; skipping further search when a result is found.
     * reduce: Accumulate each element. The supplied reduction function cannot 
rely on ordering (more formally, it should be both associative and commutative). 
There are five variants:
           o Plain reductions. (There is not a form of this method for (key, 
value) function arguments since there is no corresponding return type.)
           o Mapped reductions that accumulate the results of a given function 
applied to each element.
           o Reductions to scalar doubles, longs, and ints, using a given basis 
value.

The concurrency properties of the bulk operations follow from those of 
ConcurrentHashMap: Any non-null result returned from get(key) and related access 
methods bears a happens-before relation with the associated insertion or update. 
The result of any bulk operation reflects the composition of these per-element 
relations (but is not necessarily atomic with respect to the map as a whole 
unless it is somehow known to be quiescent). Conversely, because keys and values 
in the map are never null, null serves as a reliable atomic indicator of the 
current lack of any result. To maintain this property, null serves as an 
implicit basis for all non-scalar reduction operations. For the double, long, 
and int versions, the basis should be one that, when combined with any other 
value, returns that other value (more formally, it should be the identity 
element for the reduction). Most common reductions have these properties; for 
example, computing a sum with basis 0 or a minimum with basis MAX_VALUE.

Search and transformation functions provided as arguments should similarly 
return null to indicate the lack of any result (in which case it is not used). 
In the case of mapped reductions, this also enables transformations to serve as 
filters, returning null (or, in the case of primitive specializations, the 
identity basis) if the element should not be combined. You can create compound 
transformations and filterings by composing them yourself under this "null means 
there is nothing there now" rule before using them in search or reduce operations.

Methods accepting and/or returning Entry arguments maintain key-value 
associations. They may be useful for example when finding the key for the 
greatest value. Note that "plain" Entry arguments can be supplied using new 
AbstractMap.SimpleEntry(k,v).

Bulk operations may complete abruptly, throwing an exception encountered in the 
application of a supplied function. Bear in mind when handling such exceptions 
that other concurrently executing functions could also have thrown exceptions, 
or would have done so if the first exception had not occurred.

Parallel speedups compared to sequential processing are common but not 
guaranteed. Operations involving brief functions on small maps may execute more 
slowly than sequential loops if the underlying work to parallelize the 
computation is more expensive than the computation itself. Similarly, 
parallelization may not lead to much actual parallelism if all processors are 
busy performing unrelated tasks.

All arguments to all task methods must be non-null.

jsr166e note: During transition, this class uses nested functional interfaces 
with different names but the same forms as those expected for JDK8.


More information about the Concurrency-interest mailing list