[concurrency-interest] ConcurrentHashMap bulk parallel operations
Doug Lea
dl at cs.oswego.edu
Mon Aug 13 12:08:50 EDT 2012
As most of you know, lambda-based bulk operations with optional
parallelism are planned for JDK8. We've been working with JSR335 folks
on these. However, in addition we plan to offer access to bulk
operations designed for use on truly concurrent data structures, in
particular ConcurrentHashMaps. These methods are designed for use on
"live" data, registeries, caches, in-memory data stores, chunks of
Hadoop-style "big data" sets. and so on that might be concurrently
updated during the course of computations. The form and style of this
API are targeted to support operations that can be sensibly applied in
such contexts. This boils down to only three basic forms: forEach,
search, and reduce (each with multiple variations), expressed in an
imperative style -- no fluency, stateful Streams, etc that are planned
for the JDK8 java.util-based framework. There is no sense in
compromising support for either of these kinds of target usages, so we
won't. However, the functionality coverage is essentially identical
for those operations that do apply, so we anticipate that the JDK8
java.util-based framework will be able to layer on top of this when
applicable.
The API is in two layers. Nested (static) class "ForkJoinTasks"
returns task objects that, when invoked, provide the given
functionality, but may also be used in other ways. Nested (inner)
class "Parallel" provides an API for using them with a given
ForkJoinPool. The class-level javadocs for CHM(V8).Parallel are pasted
below. There will surely be some further API changes in the course of
JDK8 integration. However, in the mean time, we are releasing a
stand-alone form, intended to be usable by both current
ConcurrentHashMapV8 users running JDK7, as well as those experimenting
with current JDK8 preview lambda builds (at
http://jdk8.java.net/lambda/) The current javadocs don't have any
usage examples, because they look vastly different in JDK7 vs JDK8.
Doing this forces a bit of disruption on everyone though.
1. To avoid FJ version mismatches, the current jsr166y FJ classes are
duplicated into jsr166e.
2. To avoid JDK version mismatches, the j.u.c version (plain
"ConcurrentHashMap" without the "V8") is committed in main repository,
while keeping its "V8" in package jsr166e. (This also required an
initial merge of jsr166e.LongAdder and related classes.)
3. To avoid current and future naming problems, a set of function
interfaces are nested within ConcurrentHashMap, with names
intentionally different than those currently used in JDK8 previews
(for example "Action" instead of "Block"). For lambda-enabled
JDK8-preview users, this won't much matter because lambda expressions
will still match as expected. However, others tediously using this
with emulated-lambdas via static instances of classes implementing the
interfaces will have to bear with future name changes of these
interfaces. This forbearance starts immediately, because the
previously named nested MappingFunction and RemappingFunction are
already changed so as to be applicable across the extended
APIs. Sorry.
... pasting from
http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/jsr166e/ConcurrentHashMapV8.Parallel.html
public class ConcurrentHashMapV8.Parallel
An extended view of a ConcurrentHashMap supporting bulk parallel operations.
These operations are designed to be be safely, and often sensibly, applied even
with maps that are being concurrently updated by other threads; for example,
when computing a snapshot summary of the values in a shared registry. There are
three kinds of operation, each with four forms, accepting functions with Keys,
Values, Entries, and (Key, Value) arguments and/or return values. Because the
elements of a ConcurrentHashMap are not ordered in any particular way, and may
be processed in different orders in different parallel executions, the
correctness of supplied functions should not depend on any ordering, or on any
other objects or values that may transiently change while computation is in
progress; and except for forEach actions, should ideally be side-effect-free.
* forEach: Perform a given action on each element. A variant form applies a
given transformation on each element before performing the action.
* search: Return the first available non-null result of applying a given
function on each element; skipping further search when a result is found.
* reduce: Accumulate each element. The supplied reduction function cannot
rely on ordering (more formally, it should be both associative and commutative).
There are five variants:
o Plain reductions. (There is not a form of this method for (key,
value) function arguments since there is no corresponding return type.)
o Mapped reductions that accumulate the results of a given function
applied to each element.
o Reductions to scalar doubles, longs, and ints, using a given basis
value.
The concurrency properties of the bulk operations follow from those of
ConcurrentHashMap: Any non-null result returned from get(key) and related access
methods bears a happens-before relation with the associated insertion or update.
The result of any bulk operation reflects the composition of these per-element
relations (but is not necessarily atomic with respect to the map as a whole
unless it is somehow known to be quiescent). Conversely, because keys and values
in the map are never null, null serves as a reliable atomic indicator of the
current lack of any result. To maintain this property, null serves as an
implicit basis for all non-scalar reduction operations. For the double, long,
and int versions, the basis should be one that, when combined with any other
value, returns that other value (more formally, it should be the identity
element for the reduction). Most common reductions have these properties; for
example, computing a sum with basis 0 or a minimum with basis MAX_VALUE.
Search and transformation functions provided as arguments should similarly
return null to indicate the lack of any result (in which case it is not used).
In the case of mapped reductions, this also enables transformations to serve as
filters, returning null (or, in the case of primitive specializations, the
identity basis) if the element should not be combined. You can create compound
transformations and filterings by composing them yourself under this "null means
there is nothing there now" rule before using them in search or reduce operations.
Methods accepting and/or returning Entry arguments maintain key-value
associations. They may be useful for example when finding the key for the
greatest value. Note that "plain" Entry arguments can be supplied using new
AbstractMap.SimpleEntry(k,v).
Bulk operations may complete abruptly, throwing an exception encountered in the
application of a supplied function. Bear in mind when handling such exceptions
that other concurrently executing functions could also have thrown exceptions,
or would have done so if the first exception had not occurred.
Parallel speedups compared to sequential processing are common but not
guaranteed. Operations involving brief functions on small maps may execute more
slowly than sequential loops if the underlying work to parallelize the
computation is more expensive than the computation itself. Similarly,
parallelization may not lead to much actual parallelism if all processors are
busy performing unrelated tasks.
All arguments to all task methods must be non-null.
jsr166e note: During transition, this class uses nested functional interfaces
with different names but the same forms as those expected for JDK8.
More information about the Concurrency-interest
mailing list