[concurrency-interest] Thoughts about LongAdder

Nathan Reynolds nathan.reynolds at oracle.com
Wed Aug 10 12:37:49 EDT 2011

By way of introduction, I work at Oracle in the PSR team (Performance, 
Scalability and Reliability).  This team is focuses on optimizing the 
applications and middleware.  My task is to optimize Weblogic server, 
the JVM, kernel and network for Oracle's Exalogic, a rack of Westmere-EP 
servers.  Most of my focus is on Weblogic server and some what on the 
other areas as needed.

Monday, I came up with a "new" data structure, ConcurrentCounter, to 
take advantage of a processor id API I proposed to the Oracle HotSpot 
team.  Alan Bateman sent me a link to an email in this list to 
LongAdder.  I reviewed the LongAdder implementation and learned a bit 
from it.  I enhanced ConcurrentCounter based on some of the ideas in 
LongAdder.  Here's a list of ideas of where LongAdder and 
ConcurrentCounter differ.  Hopefully, this list will help improve the 
final result .

 1. I would recommend striping the counter *not* by thread but by
    processor/core.  This will result in almost 0 contention and
    significantly reduce cache coherence traffic.  If striped by thread,
    then the counter will probably consume more memory, the thread to
    Cell mapping may not stabilize and when a thread migrates to another
    processor, the cache lines have to be transferred.  If striped by
    processor, then thread migration has no impact (unless it was in the
    middle of updating a Cell).  Because of this, striping by processor
    will outperform striping by thread.  (More on this in another
    email.)  Also, the number of Cells required will be fixed to the
    number of processors which is almost always fewer than the number of
    threads in server processes.  The attached ConcurrentCounter stripes
    by processor.
 2. LongAdder pads each Cell to avoid false sharing.  Good idea. 
    However, the amount of memory required can be quite costly in terms
    of capacity, wasting cache space and bandwidth.  ConcurrentCounter
    (attached) assumes the JVM allocator is NUMA aware.  Thus, each
    "Cell" will be allocated in memory local to the processor.  False
    sharing shouldn't be possible and if it happens then the contention
    will only happen among sibling cores.  If GC is NUMA aware, then
    each "Cell" will remain in local memory.  Again, false sharing isn't
    likely to be a problem.  Thus, padding is hopefully not needed.  I
    am interested in results showing padding vs no padding... assuming
    all of the JVM requirements are met.
 3. reset() should replace the array of Cell instead of setting the
    values to 0.  This will increase GC and allocator load but reset()
    can then be atomic.  ConcurrentCounter.clear() implements this logic
    quite simply.
 4. getAndReset() should also replace the array of Cell.  The get() can
    be computed on the old array of Cell and should be much more
    accurate since the concurrent updates should start using the new
    array right away.  The inaccuracies will come from threads that
    context switched off while in the middle of an update and then
    resume later using the old array.  ConcurrentCounter.getAndClear()
    does this.
 5. ConcurrentCounter supports a set() API.  It basically does the
    reset() trick I mentioned above but the new set of "Cells" is
    already assigned the given value.
 6. ConcurrentCounter executes only 1 atomic instruction per update (in
    the main path) and never uses non-blocking locks.
 7. retryUpdate() is extremely complicated code (due to thread striping
    and balancing).  ConcurrentCounter doesn't have such complexity.

Nathan Reynolds 
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> | 
Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20110810/9c2b44f5/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ConcurrentCounter.java
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20110810/9c2b44f5/attachment.ksh>

More information about the Concurrency-interest mailing list