[concurrency-interest] Does StampedLock need a releaseFence in theory?

Hans Boehm boehm at acm.org
Sun Jul 24 12:22:24 EDT 2016

I think a good analogy is to compare the Aarch64 CAS implementation with
CAS implemented on top of a roach-motel lock associated with the CAS
location. The ordering properties are very simillar for both.

This is a bit unfamiliar because most traditional lock implementations have
included fences, and hence have not allowed full roach-motel reordering at
the hardware level. But Itanium had fence-less lock implementations before
Aarch64 did, with even weaker acquire/release operations.

(On Itanium, I believe

x = a;
unlock l1;
lock l2;
y = a;

does not order the two stores, since release and acquire operations can be
reordered.  On Aarch64, it happens to do so.  None of which is detectable
in data-race-free programs.)

On Tue, Jul 19, 2016 at 7:42 PM, Martin Buchholz <martinrb at google.com>

> On Fri, Jul 15, 2016 at 12:59 AM, David Holmes <davidcholmes at aapt.net.au>
> wrote:
>>  Also note that C++11 Cmpxhng-SeqCst  mapping for Aarch64 does not add
>> any additional explicit barriers:
>> https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
> I've also been struggling to understand this, having thought of strong CAS
> as a single atomic bidirectional fenced operation.
>  Cmpxchng SeqCst is implemented using a ldaxr followed by a stlxr
> I believe it is legal for relaxed memory ops before the ldaxr to be
> reordered with the ldaxr
> and likewise
> I believe it is legal for relaxed memory ops after the stlxr to be
> reordered with the stlxr
> and then to be reordered with each other (roach motel style)
> without violating sequential consistency of  ldaxr and stlxr and without
> interfering with the use of these operations for implementing traditional
> locks.  But seqlocks are not traditional locks - they're a little
> "backwards".
> I even have a mental model that justifies such behavior.  Suppose there is
> a slow read in progress and the cpu happens to already have exclusive
> access to the cache line containing the cas word.  It knows that the cas
> will succeed because it owns the cache line.  But because of release
> semantics, the release write cannot complete until the slow read
> completes.  Cpus hate stalls, so starts executing subsequent relaxed
> stores.  Unlike the stlxr, which has to wait for the slow read, there is
> nothing in the spec to prevent the subsequent stores from being written to
> memory immediately.  If the fast write and the slow read are to the same
> memory location, the read before the cas can see the write after the cas!
> """The Store-Release places no additional ordering constraints on any
> loads or stores appearing after the
> Store-Release instruction."""
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20160724/8dd9c7e9/attachment.html>

More information about the Concurrency-interest mailing list