[concurrency-interest] x86 NOOP memory barriers

Nitsan Wakart nitsanw at yahoo.com
Fri Aug 2 09:15:42 EDT 2013


So because a putOrdered is a write to memory it cannot be reordered with other writes, as per "8.2.3.2 Neither Loads Nor Stores Are Reordered with Like Operations".

________________________________
 From: Michael Barker <mikeb01 at gmail.com>
To: Nitsan Wakart <nitsanw at yahoo.com> 
Cc: Vitaly Davidovich <vitalyd at gmail.com>; "concurrency-interest at cs.oswego.edu" <concurrency-interest at cs.oswego.edu> 
Sent: Friday, August 2, 2013 2:15 PM
Subject: Re: [concurrency-interest] x86 NOOP memory barriers
 

Hi Nitsan,

In short, no.  The place to look (as definitive as it gets for the
hardware) is section 8.2.2, volume 3A of the Intel programmer
manual[0].  It lists the rules that are applied regarding the
reordering of instructions under the X86 memory model.  I've
summarised the relevant ones here:

- Reads are not reordered with other reads.
- Writes are not reordered with older reads.
- Writes to memory are not reordered with other writes*.
- Reads may be reordered with older writes to different locations but
not with older writes to the same location.

The only reordering that will occur with x86 is allowing reads to be
executed before writes (to other locations), hence the need for a
LOCKed instruction to enforce the store/load barrier.  As you can see
with the above rules, store are not reordered with older stores and
loads are not reordered with older loads so a series of MOV
instructions is sufficient for a store/store or a load/load barrier.

I'm not really an authority either, but you'll be fairly safe
referencing in the Intel manuals with regards to the hardware
behaviour.

*There are exceptions with regards to writes that come into play when
using specific instructions, but won't be a factor here.  E.g. you can
use specific non-temporal stores to subvert the normal cache-coherency
rules.

Mike.

[0] http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html


On 2 August 2013 23:25, Nitsan Wakart <nitsanw at yahoo.com> wrote:
> Can the writes get re-ordered on the hardware level?
> Don't volatile reads(LOAD/LOAD) also require reads to not get re-ordered to
> maintain happens-before/after relationships?
>
> ________________________________
> From: Vitaly Davidovich <vitalyd at gmail.com>
> To: Nitsan Wakart <nitsanw at yahoo.com>
> Cc: Concurrency Interest <concurrency-interest at cs.oswego.edu>
> Sent: Friday, August 2, 2013 12:46 PM
> Subject: Re: [concurrency-interest] x86 NOOP memory barriers
>
> I'm not an authority so take this for what it's worth...
> Yes, volatile loads and lazySet do not cause any cpu fence/barrier
> instructions to be generated - in that sense, they're nop at the hardware
> level.  However, they are also compiler barriers, which is where the "cheap
> but aint free" phrase may apply.  The compiler cannot reorder these
> instructions in ways that violate their documented/spec'd memory ordering
> effects.  So for example, a plain store followed by lazySet cannot actually
> be moved after the lazySet; whereas if you have two plain stores, the
> compiler can technically reorder them as it sees fit (if we look at just
> them two and disregard other surrounding code).
> So, it may happen that compiler cannot do certain code motion/optimizations
> due to these compiler fences and therefore you have some penalty vs using
> plain load and stores.  For volatile loads, compiler cannot enregister the
> value like it would with plain load, but even this may not have noticeable
> perf diff if the data is in L1 dcache, for example.
> HTH,
> Vitaly
> Sent from my phone
> On Aug 2, 2013 5:55 AM, "Nitsan Wakart" <nitsanw at yahoo.com> wrote:
>
> Hi,
> For clarity's sake I'd like an official explanation for the often quoted
> "all barriers except STORE/LOAD are a no-op on x86" statement from the JMM
> cookbook.
> Can someone (of authority, so I can later say: "But Mr. Authority here
> says...") please confirm/expand on/deny that while a volatile read or an
> AtomicLong.lazySet are a CPU noop (in the sense that they are a MOV like any
> other), they are also compiler instructions. One cannot simply replace a
> lazySet with a plain write to get the same effect. They might be cheap but
> they ain't free...
> I would appreciate some more careful wording on this topic.
> Many thanks,
> Nitsan
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130802/3341238c/attachment.html>


More information about the Concurrency-interest mailing list