[concurrency-interest] Thoughts about LongAdder
Nathan Reynolds
nathan.reynolds at oracle.com
Thu Aug 18 19:05:21 EDT 2011
On 8/18/2011 1:41 PM, Doug Lea wrote:
> Sorry for the delay on this!
>
> On 08/10/11 12:37, Nathan Reynolds wrote:
>> Here's a list of ideas of where
>> LongAdder and ConcurrentCounter differ.
>>
>> 1. I would recommend striping the counter *not* by thread but by
>> processor/core. This will result in almost 0 contention and
>> significantly
>> reduce cache coherence traffic.
>
> This would be a good idea if we had a ProcessorLocal class.
> Which is probably a good idea. Here's a minimal sketch
> Add
> class java.lang.Processor {
> int getId();
> int getMaxId();
> static int proximity(Processor a, Processor b); // metrics TBD
> // non-public ProcessorLocal table
> }
> Add to class java.lang.Thread:
> Processor getProcessor();
> Add
> class java.lang.ProcessorLocal {
> // mostly like ThreadLocal
> }
In another email I sent on 8/10/2011 at 3:50 pm, it has my attempt at
ProcessorLocal and ProcessorIndex. The latter class maps raw CPU/OS
processor ID into a 0 to N-1 index. It doesn't provide a Processor
class or a proximity() metric.
> However, in the mean time, the internal randomization done
> inside LongAdder gives performance that seems to closely approximate
> what you might get by doing this.
>
>
>> 2. LongAdder pads each Cell to avoid false sharing. Good idea.
>> However, the
>> amount of memory required can be quite costly in terms of
>> capacity,
>> wasting cache space and bandwidth. ConcurrentCounter (attached)
>> assumes
>> the JVM allocator is NUMA aware.
>
> I agree that padding is a bit hacky, but I don't know how a JVM
> can know to do this without either the assertion of or evidence
> of a variable being contended. The best suggestion I've heard on
> this is to add an annotation @Contended.
>
>
>> I am interested in results showing padding vs no padding...
>> assuming all of the JVM requirements are met.
>
> I see more than a factor of 6 slowdown on Intel i7 and AMD 6172 test
> machines.
>
>> 3. reset() should replace the array of Cell instead of setting the
>> values to
>> 0. This will increase GC and allocator load but reset() can
>> then be
>> atomic.
>
> Thanks for the suggestion, but I don't know of use cases for which
> reset(), but not sum(), need to be atomic -- the only use cases I
> know are to reset when it is otherwise known that threads are quiescent.
Thanks for pointing that out. I can't think of a use case either. My
implementation provided a simple way to make them atomic and so I did.
> Doing this would also disable the nice property of current LongAdder
> that it allocates no extra space until contended but instead uses
> "base" field.
>
>> 5. ConcurrentCounter supports a set() API. It basically does the
>> reset()
>> trick I mentioned above but the new set of "Cells" is already
>> assigned the
>> given value.
>
> Ditto.
>
>
>> 7. retryUpdate() is extremely complicated code (due to thread
>> striping and
>> balancing).
>
>
> We do the complicated stuff so you don't have to :-)
>
Oops. I now realize what I said is a bit rude, but I didn't intend it
as such. Thank you for not taking offense. What I mean to say is that
ProcessorLocal will make LongAdder much simpler to implement. But, you
already understood that.
> -Doug
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Nathan Reynolds
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> |
Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20110818/a8e12c04/attachment-0001.html>
More information about the Concurrency-interest
mailing list