[concurrency-interest] Min and Max for Atomics

Nathan and Ila Reynolds nathanila at gmail.com
Thu Aug 10 14:29:43 EDT 2017

See the https://en.wikipedia.org/wiki/MESI_protocol

Every time a core writes to a cache line (normal or atomic), it has to 
own the cache line in the exclusive state.  If the cache line is not 
already in the exclusive state, then the core has to send an 
invalidation message to all other caches in all other cores in the 
entire system.  This could mean going to another chip or even 7 other 
chips.  This invalidation message removes the cache line from all other 
cores.  The invalidating core can stall for a long time if the cache 
line is heavily contended.  This looks like 100% CPU usage but very 
sluggish progress.

I am very aware of this problem because I had to figure out that this 
was what was happening and then optimize some C++ code.  The interesting 
part of this story is that as Intel produced 4 newer chips over a period 
of 4 years, I had to revisit the code and improve the optimization.  I 
tried using a ThreadLocal variable but as the thread migrated to 
different cores, then the core had to migrate the cache line via cache 
invalidation.  I finally came up with an optimization which no longer 
suffers from this problem.  The final solution was to assign a cache 
line (i.e. variable) to 1 core.  In other words, it was a CoreLocal 
variable, if you will.  This reduced the cache invalidations and I have 
not had to revisit this code for 6 years.

So, updateAndGet() suffers from cache invalidation even if a write is 
not necessary.  It also suffers from CAS latency and a memory fence.  I 
realize that in some cases, this is exactly what one would want to pay 
for.  In my case, the updates are not frequent enough to warrant the cost.


On 8/10/2017 11:56 AM, Andrew Haley wrote:
> On 10/08/17 18:35, Nathan and Ila Reynolds wrote:
>> Yes, I get the same behavior, but I will have to pay for a cache
>> invalidation, CAS and a memory fence with each call.  For example, if I
>> am tracking a high-water mark then at the beginning the updates should
>> be very often and then taper off to nothing.  Thus, over time the cost
>> is reduced to a load from cache or RAM.
> What is this cache invalidation of which you speak?  After the
> AtomicReference.updateAndGet() discussion last time around, it was
> clear enough that no such thing was necessary.  And besides that,
> the message is clear: use a VarHandle for such things.


More information about the Concurrency-interest mailing list