[concurrency-interest] Thoughts about LongAdder

Nathan Reynolds nathan.reynolds at oracle.com
Thu Aug 18 19:05:21 EDT 2011

On 8/18/2011 1:41 PM, Doug Lea wrote:
> Sorry for the delay on this!
> On 08/10/11 12:37, Nathan Reynolds wrote:
>> Here's a list of ideas of where
>> LongAdder and ConcurrentCounter differ.
>>    1. I would recommend striping the counter *not* by thread but by
>>       processor/core. This will result in almost 0 contention and 
>> significantly
>>       reduce cache coherence traffic.
> This would be a good idea if we had a ProcessorLocal class.
> Which is probably a good idea. Here's a minimal sketch
> Add
>   class java.lang.Processor {
>      int getId();
>      int getMaxId();
>      static int proximity(Processor a, Processor b); // metrics TBD
>      // non-public ProcessorLocal table
>   }
> Add to class java.lang.Thread:
>       Processor getProcessor();
> Add
>    class java.lang.ProcessorLocal {
>      // mostly like ThreadLocal
>   }
In another email I sent on 8/10/2011 at 3:50 pm, it has my attempt at 
ProcessorLocal and ProcessorIndex.  The latter class maps raw CPU/OS 
processor ID into a 0 to N-1 index.  It doesn't provide a Processor 
class or a proximity() metric.
> However, in the mean time, the internal randomization done
> inside LongAdder gives performance that seems to closely approximate
> what you might get by doing this.
>>    2. LongAdder pads each Cell to avoid false sharing. Good idea. 
>> However, the
>>       amount of memory required can be quite costly in terms of 
>> capacity,
>>       wasting cache space and bandwidth. ConcurrentCounter (attached) 
>> assumes
>>       the JVM allocator is NUMA aware.
> I agree that padding is a bit hacky, but I don't know how a JVM
> can know to do this without either the assertion of or evidence
> of a variable being contended. The best suggestion I've heard on
> this is to add an annotation @Contended.
>>       I am interested in results showing padding vs no padding...
>>       assuming all of the JVM requirements are met.
> I see more than a factor of 6 slowdown on Intel i7 and AMD 6172 test 
> machines.
>>    3. reset() should replace the array of Cell instead of setting the 
>> values to
>>       0. This will increase GC and allocator load but reset() can 
>> then be
>>       atomic.
> Thanks for the suggestion, but I don't know of use cases for which
> reset(), but not sum(), need to be atomic -- the only use cases I
> know are to reset when it is otherwise known that threads are quiescent.

Thanks for pointing that out.  I can't think of a use case either.  My 
implementation provided a simple way to make them atomic and so I did.

> Doing this would also disable the nice property of current LongAdder
> that it allocates no extra space until contended but instead uses
> "base" field.
>>    5. ConcurrentCounter supports a set() API. It basically does the 
>> reset()
>>       trick I mentioned above but the new set of "Cells" is already 
>> assigned the
>>       given value.
> Ditto.
>>    7. retryUpdate() is extremely complicated code (due to thread 
>> striping and
>>       balancing).
> We do the complicated stuff so you don't have to  :-)

Oops.  I now realize what I said is a bit rude, but I didn't intend it 
as such.  Thank you for not taking offense.  What I mean to say is that 
ProcessorLocal will make LongAdder much simpler to implement.  But, you 
already understood that.

> -Doug
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Nathan Reynolds 
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> | 
Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20110818/a8e12c04/attachment-0001.html>

More information about the Concurrency-interest mailing list