[concurrency-interest] Does StampedLock need a releaseFence in theory?

David Holmes davidcholmes at aapt.net.au
Fri Jul 15 03:59:24 EDT 2016



“CAS is implemented using a ldaxr followed by stlxr which is efficient, but allows subsequent writes to move in between the ldaxr and the stlxr.“


“we” don’t think it is the case. Unfortunately the spec does not clearly state this but informally:

-          Any such writes would potentially invalidate the reservation so moving writes into there seems a bad idea unless you expend effort determining it won’t invalidate the reservation

-          Any such writes would be speculative and potentially need undoing. So this also seems like far too much effort for little if any gain

-          If this were truly an issue then the “Barrier Litmus Test” Appendix in the ARMv8 Architecture manual would flag it and show the use of explicit memory barriers


It would be good if the ARM folk could clarify this, and if necessary get the “Barrier Litmus test” text updated.


Also note that C++11 Cmpxhng-SeqCst  mapping for Aarch64 does not add any additional explicit barriers:






From: Concurrency-interest [mailto:concurrency-interest-bounces at cs.oswego.edu] On Behalf Of Martin Buchholz
Sent: Friday, July 15, 2016 1:27 AM
To: Andrew Haley <aph at redhat.com>
Cc: Doug Lea <dl at cs.oswego.edu>; concurrency-interest <concurrency-interest at cs.oswego.edu>
Subject: Re: [concurrency-interest] Does StampedLock need a releaseFence in theory?




On Thu, Jul 14, 2016 at 1:23 AM, Andrew Haley <aph at redhat.com <mailto:aph at redhat.com> > wrote:

On 14/07/16 01:53, Hans Boehm wrote:
> An ARMv8 compareAndSet operation (using only acquire and release
> operations, not dmb, as it should be implemented) will behave like the
> lock-based one in this respect.  I think the current code above is
> incorrect on ARMv8 (barring compensating pessimizations elsewhere).

Umm, what?  The ARMv8 compareAndSet has a sequentially consistent store.
I guess I must be missing something important.


(Pretending to be Hans here ...)

The idea is that all ARMv8 "load-acquire/store-release" operations (including those used for implementing CAS) are sequentially consistent when considered as a group in the same way that all "synchronization actions" in Java are, but they can still be reordered with plain reads/writes, just like Java plain variable access can be reordered with volatile variable access (unless a happens-before relationship exists).


The section in


on aarch64 should be useful.


ldar corresponds to Java volatile read

stlr corresponds to Java volatile write

CAS is implemented using a ldaxr followed by stlxr which is efficient, but allows subsequent writes to move in between the ldaxr and the stlxr.


(Back to being Martin ...)

Reordering a plain store from after to before a stlxr (rather than non-exclusive stlr) is still rather surprising because it looks like a speculative store - we don't know yet whether the stlxr will succeed.  Unlike the case where we implement CAS via a lock.  Am I thinking too atomically?  Perhaps the stlxr instruction implementation exclusively acquires the cache line, sees that it surely will succeed, but will be slow because pending memory operations must be completed first.  But no reason we can't start executing future instructions in the meantime!?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20160715/a3fd6744/attachment-0001.html>

More information about the Concurrency-interest mailing list