[concurrency-interest] ConcurrentHashMap bulk parallel operations

Rémi Forax forax at univ-mlv.fr
Tue Aug 14 17:58:21 EDT 2012


On 08/14/2012 05:33 PM, Doug Lea wrote:
> On 08/14/12 10:42, Rémi Forax wrote:
>
>> Doug, can you do the same test but instead of switching to a more 
>> expensive sum
>> function,
>> just switch to the same function but declared in an other inner 
>> class, and then
>> switch again
>> to the same function declared in a third inner class.
>
> Sure. Here's a variant that first does x + y:
>    long psum = pm.reduceValuesToLong((x) ->x.longValue(), 0L,
>                                                 (x,y) -> x + y);
> then y + x:
>             long psum = pm.reduceValuesToLong((x) ->x.longValue(), 0L,
>                                                 (x,y) -> y + x);
> then explicit function-objects "pls"
>             long psum = pm.reduceValuesToLong(toLong, 0L, longPlus);
> then back to x+y
>
> Then just for algorithmic comparison, one that doesn't
> bother with reduction, but instead just uses a forEach into a LongAdder:
>             pm.forEachValue((x) -> adr.add(x.longValue()));
>
> All running on current jdk8-lambda binary build on linux 64XNehalem
> with a  ConcurrentHashMapV8<Long, Long> with random long keys.
> There are a few other setup etc differences with the
> version of the program I showed results before (including
> those reducing GC impact during 1st parallel run, which
> otherwise accentuate the terribleness.)

That's amazing, you can see the opt/deopt dance done by Hotspot in 
reduceValuesToLong and
the resulting profile pollution that decrease the perf.
Note that is not the worst case, the worst case is when a lambda do 
boxing because the VM will not able to remove it due to the profile 
pollution, in that case you end up with an allocation in your tight loop.

>
> seq time   1.081
> seq time   1.115
> par x+y time   3.137
> par x+y time   0.048
> par x+y time   0.503
> par x+y time   0.035 <- one guard is installed (monomorphic)
> par x+y time   0.036
> par x+y time   0.034
> par x+y time   0.034
> par x+y time   0.034
> par x+y time   0.033
> par x+y time   0.033
> par y+x time   0.481 <- deoptimization, there is two lambda now
> par y+x time   0.038 <- two guards now (bimorphic)
> par y+x time   0.037
> par y+x time   0.038
> par y+x time   0.036
> par y+x time   0.037
> par y+x time   0.037
> par y+x time   0.036
> par y+x time   0.037
> par y+x time   0.038
> par pls time   0.734 <- deoptimization, there is more than two lambda
> par pls time   0.045 <- give up, do a vtable call (polymorphic)
> par pls time   0.041
> par pls time   0.044
> par pls time   0.043
> par pls time   0.041
> par pls time   0.051
> par pls time   0.042
> par pls time   0.042
> par pls time   0.042
> par x+y time   0.062 <- new lambda, still a vtable call, no deopt
> par x+y time   0.042
> par x+y time   0.041
> par x+y time   0.043
> par x+y time   0.042
> par x+y time   0.042
> par x+y time   0.042
> par x+y time   0.042
> par x+y time   0.041
> par x+y time   0.043

Rémi



More information about the Concurrency-interest mailing list