[concurrency-interest] x86 NOOP memory barriers

Martin Thompson mjpt777 at gmail.com
Tue Aug 6 16:09:28 EDT 2013

> > ?I?m also surprised by the lazy vs. volatile differences in the shared
> case. ?It seems to me the time should be completely dominated by coherence
> misses in either case.? There may be some unexpected odd optimization or
> lack thereof happening here.? In my limited experience, the impact of
> memory fences, etc. commonly decreases as scale increases, since those
> slowdowns are local to each core, and don?t affect the amount of memory
> traffic.? See for example the microbenchmark measurements in
> http://www.hpl.hp.com/techreports/2012/HPL-2012-218.html .

It is my understanding that lazy set tends to dampen false sharing effects
> as the value is not immediately 'flushed' and can be modified while in the
> write queue. The less you 'force' the write, the less you contend on the
> cache line.

The lazySet resulting in a vanila MOV that can benefit from the write
combining buffers plus the core does not have to wait on the store buffer
draining as it does with the volatile write.  No flushing would be involved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130806/8f371d99/attachment.html>

More information about the Concurrency-interest mailing list