[concurrency-interest] x86 NOOP memory barriers

Vitaly Davidovich vitalyd at gmail.com
Fri Aug 2 13:28:08 EDT 2013


I think store-load forwarding can satisfy a load from store buffer.

Write combining store buffer will combine stores to same cacheline.  I
don't think writes go into the store buffer before their address is
computed, so don't think you'll have two adjacent stores to same address
sitting there at same time (I could be wrong on that though but seems like
core should know address by now since it would've detected a cache miss on
it and thus dropped the store into the buffer).

Sent from my phone
On Aug 2, 2013 12:58 PM, "Nathan Reynolds" <nathan.reynolds at oracle.com>
wrote:

>  Just to add, if the address of a load or store is not known yet (i.e.
> the core is still computing it), then no loads may reorder in front of it.
> This is because if the store and load are for the same address then the
> load needs to pick up the value of the store.
>
> Ignore any fences... Let's say there is a store sitting in the load/store
> buffer inside the core.  Let's say the address for the store has been
> computed.  Could a load from that same address simply use the value that is
> going to be stored?  Just checking if my understanding is correct.
>
> Ignore any fences... Let's say there are 2 adjacent stores sitting in the
> load/store buffer inside the core.  Let's say the addresses for the stores
> have been computed and are identical.  Could the stores be combined into 1
> store so that only the latter store actually pushes its data to L1D cache?
> Again, just checking if my understanding is correct.
>
> -Nathan
>
> On 8/2/2013 5:15 AM, Michael Barker wrote:
>
> Hi Nitsan,
>
> In short, no.  The place to look (as definitive as it gets for the
> hardware) is section 8.2.2, volume 3A of the Intel programmer
> manual[0].  It lists the rules that are applied regarding the
> reordering of instructions under the X86 memory model.  I've
> summarised the relevant ones here:
>
> - Reads are not reordered with other reads.
> - Writes are not reordered with older reads.
> - Writes to memory are not reordered with other writes*.
> - Reads may be reordered with older writes to different locations but
> not with older writes to the same location.
>
> The only reordering that will occur with x86 is allowing reads to be
> executed before writes (to other locations), hence the need for a
> LOCKed instruction to enforce the store/load barrier.  As you can see
> with the above rules, store are not reordered with older stores and
> loads are not reordered with older loads so a series of MOV
> instructions is sufficient for a store/store or a load/load barrier.
>
> I'm not really an authority either, but you'll be fairly safe
> referencing in the Intel manuals with regards to the hardware
> behaviour.
>
> *There are exceptions with regards to writes that come into play when
> using specific instructions, but won't be a factor here.  E.g. you can
> use specific non-temporal stores to subvert the normal cache-coherency
> rules.
>
> Mike.
>
> [0] http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
>
>
> On 2 August 2013 23:25, Nitsan Wakart <nitsanw at yahoo.com> <nitsanw at yahoo.com> wrote:
>
>  Can the writes get re-ordered on the hardware level?
> Don't volatile reads(LOAD/LOAD) also require reads to not get re-ordered to
> maintain happens-before/after relationships?
>
> ________________________________
> From: Vitaly Davidovich <vitalyd at gmail.com> <vitalyd at gmail.com>
> To: Nitsan Wakart <nitsanw at yahoo.com> <nitsanw at yahoo.com>
> Cc: Concurrency Interest <concurrency-interest at cs.oswego.edu> <concurrency-interest at cs.oswego.edu>
> Sent: Friday, August 2, 2013 12:46 PM
> Subject: Re: [concurrency-interest] x86 NOOP memory barriers
>
> I'm not an authority so take this for what it's worth...
> Yes, volatile loads and lazySet do not cause any cpu fence/barrier
> instructions to be generated - in that sense, they're nop at the hardware
> level.  However, they are also compiler barriers, which is where the "cheap
> but aint free" phrase may apply.  The compiler cannot reorder these
> instructions in ways that violate their documented/spec'd memory ordering
> effects.  So for example, a plain store followed by lazySet cannot actually
> be moved after the lazySet; whereas if you have two plain stores, the
> compiler can technically reorder them as it sees fit (if we look at just
> them two and disregard other surrounding code).
> So, it may happen that compiler cannot do certain code motion/optimizations
> due to these compiler fences and therefore you have some penalty vs using
> plain load and stores.  For volatile loads, compiler cannot enregister the
> value like it would with plain load, but even this may not have noticeable
> perf diff if the data is in L1 dcache, for example.
> HTH,
> Vitaly
> Sent from my phone
> On Aug 2, 2013 5:55 AM, "Nitsan Wakart" <nitsanw at yahoo.com> <nitsanw at yahoo.com> wrote:
>
> Hi,
> For clarity's sake I'd like an official explanation for the often quoted
> "all barriers except STORE/LOAD are a no-op on x86" statement from the JMM
> cookbook.
> Can someone (of authority, so I can later say: "But Mr. Authority here
> says...") please confirm/expand on/deny that while a volatile read or an
> AtomicLong.lazySet are a CPU noop (in the sense that they are a MOV like any
> other), they are also compiler instructions. One cannot simply replace a
> lazySet with a plain write to get the same effect. They might be cheap but
> they ain't free...
> I would appreciate some more careful wording on this topic.
> Many thanks,
> Nitsan
>
> _______________________________________________
> Concurrency-interest mailing listConcurrency-interest at cs.oswego.eduhttp://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
>
> _______________________________________________
> Concurrency-interest mailing listConcurrency-interest at cs.oswego.eduhttp://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>  _______________________________________________
> Concurrency-interest mailing listConcurrency-interest at cs.oswego.eduhttp://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130802/dc7f2b53/attachment-0001.html>


More information about the Concurrency-interest mailing list