[concurrency-interest] Does StampedLock need a releaseFence in theory?
boehm at acm.org
Sun Jul 24 12:22:24 EDT 2016
I think a good analogy is to compare the Aarch64 CAS implementation with
CAS implemented on top of a roach-motel lock associated with the CAS
location. The ordering properties are very simillar for both.
This is a bit unfamiliar because most traditional lock implementations have
included fences, and hence have not allowed full roach-motel reordering at
the hardware level. But Itanium had fence-less lock implementations before
Aarch64 did, with even weaker acquire/release operations.
(On Itanium, I believe
x = a;
y = a;
does not order the two stores, since release and acquire operations can be
reordered. On Aarch64, it happens to do so. None of which is detectable
in data-race-free programs.)
On Tue, Jul 19, 2016 at 7:42 PM, Martin Buchholz <martinrb at google.com>
> On Fri, Jul 15, 2016 at 12:59 AM, David Holmes <davidcholmes at aapt.net.au>
>> Also note that C++11 Cmpxhng-SeqCst mapping for Aarch64 does not add
>> any additional explicit barriers:
> I've also been struggling to understand this, having thought of strong CAS
> as a single atomic bidirectional fenced operation.
> Cmpxchng SeqCst is implemented using a ldaxr followed by a stlxr
> I believe it is legal for relaxed memory ops before the ldaxr to be
> reordered with the ldaxr
> and likewise
> I believe it is legal for relaxed memory ops after the stlxr to be
> reordered with the stlxr
> and then to be reordered with each other (roach motel style)
> without violating sequential consistency of ldaxr and stlxr and without
> interfering with the use of these operations for implementing traditional
> locks. But seqlocks are not traditional locks - they're a little
> I even have a mental model that justifies such behavior. Suppose there is
> a slow read in progress and the cpu happens to already have exclusive
> access to the cache line containing the cas word. It knows that the cas
> will succeed because it owns the cache line. But because of release
> semantics, the release write cannot complete until the slow read
> completes. Cpus hate stalls, so starts executing subsequent relaxed
> stores. Unlike the stlxr, which has to wait for the slow read, there is
> nothing in the spec to prevent the subsequent stores from being written to
> memory immediately. If the fast write and the slow read are to the same
> memory location, the read before the cas can see the write after the cas!
> """The Store-Release places no additional ordering constraints on any
> loads or stores appearing after the
> Store-Release instruction."""
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest