[concurrency-interest] AtomicReference.updateAndGet() mandatory updating

Nathan and Ila Reynolds nathanila at gmail.com
Fri May 26 12:15:17 EDT 2017

The particular situation I encountered was on x86 hardware. However, 
even on other memory architectures, replacing writes with full fences is 
definitely faster.  Here's why.  x86 hardware implements MESI cache 
protocol.  These other architectures implement MERSI cache protocol.

Background information:  On some (all?) x86 architectures, CAS is 
implemented in L3 cache.  The core executing CAS has to send a message 
to the L3 cache and wait for the L3 cache to reply stating if the 
operation succeed or failed.  This explains why CAS takes so many 
cycles.  The benefit is that the L3 cache can keep the only copy of the 
cache line and all of the cores on the processor can send CAS operations 
and get higher throughput.  If CAS were implemented in L1 cache and the 
core's pipeline, then it would take about 9 cycles.  3 cycles to load 
the data into the core's pipeline.  3 cycles to execute the comparison.  
3 cycles to store data into the cache line.  This would yield higher 
performance (lower CAS latency) but would lower overall throughput since 
the cache line would have to travel among the cores in the processor.

In both cache protocols, a write forces the cache line into the M 
(modified) state.  Thus, the cache line can only exist in 1 cache and 
hence must move from L3 cache to L3 cache in order to be modified by the 
cores in each processor.  This will cause cores to stall while they 
attempt to get the cache line into its own L3 cache in the E (exclusive) 
or M (modified) states.

Instead, if we replace the write with a full fence, then the cache line 
can stay longer in the S (shared) state.  Thus, the cache line can exist 
in all caches.  All of the cores can progress at a higher throughput.  
The penalty is that they have to implement a full fence.  The number of 
cycles to implement a full fence is less than or equal to stalling 
waiting for a highly contended cache line.  It depends on the case.  In 
the case where the loads and stores can proceed quickly because all of 
the cache lines are available, the full fence will have very little to 
no impact.  In the case where a load or store before the full fence is 
waiting for a cache line, then the pipeline will stall.  If the load or 
store is waiting for that highly contended cache line, then the full 
fence will be no faster (or slower).

So, if we can replace writes with full fences, we will get faster 
performance and better cache scalability since the cache line doesn't 
have to travel among processors.


On 5/26/2017 6:56 AM, Andrew Dinn wrote:
> On 26/05/17 13:46, Doug Lea wrote:
>> On 05/26/2017 04:44 AM, Andrew Haley wrote:
>>>> a compareAndSet has the memory semantics of a volatile
>>>> write regardless of whether or not the write occurred.
>>> That property of compareAndSet is very ugly, and is a pain to
>>> implement. Most of the CAS instructions I know about don't have such
>>> a property, and making it work requires either an unconditional full
>>> fence after every CAS or a conditional one if it fails.  Either way
>>> sucks mightily and is unlikely to be useful except in the most
>>> pathological cases.
>> Initially (in Java5) requiring it has led to some questionable reliance.
>>   So we cannot change it. But there's not much motivation to do so anyway:
>> As implied by Nathan Reynolds, encountering some (local) fence overhead
>> on CAS failure typically reduces contention and may improve throughput.
> It would be useful to know if that reduction in contention is specific
> to, say, x86 hardware or also occurs on weak memory architectures like
> AArch64 or ppc. Perhaps Nathan could clarify that?
> regards,
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest


More information about the Concurrency-interest mailing list