[concurrency-interest] AtomicXXX.lazySet and happens-before reasoning

Boehm, Hans hans.boehm at hp.com
Sat Oct 1 12:32:05 EDT 2011

>From my perspective, the main argument for fences in the C++11 memory model is to allow easy conversion of existing applications.  Another argument brought forward, especially by the original proposal, was that fences allow you to further tweak certain pieces of code, e.g. if you were releasing two custom-implemented spin-locks in a row.  Both arguments are valid to some extent, but the latter has always struck me as a very minor issue.  (IIRC, some of the examples in the original paper can be sped up slightly with fences.  Some of the other examples don't actually benefit.)  I supported the addition of fences based on the argument about legacy code.  (Also the original version was much simpler to add and, as it turned out, mostly useless.  As often happens, the actual complexity didn't appear until later.)  I would not recommend that people use C++ fences for new code.

We've had discussions about alternate C++ in atomics elsewhere.  And I'm sure we will continue to disagree about a lot of it, though I wouldn't be opposed to eventual support for noncompeting accesses to atomics.  And none of this is directly applicable to Java.


> -----Original Message-----
> From: Alexander Terekhov [mailto:TEREKHOV at de.ibm.com]
> Sent: Saturday, October 01, 2011 4:10 AM
> To: Boehm, Hans
> Cc: Ruslan Cheremin; Doug Lea; concurrency-interest at cs.oswego.edu
> Subject: Re: [concurrency-interest] AtomicXXX.lazySet and happens-
> before reasoning
> "Boehm, Hans" <hans.boehm at hp.com> wrote:
> [...]
> > C++ model . . . I do not believe you ever want to reason in terms of
> > fences.
> For programs targeted to run on POWER/PPC and ARM hardware, reasoning
> and coding in terms of fences is the best and that is why the latest
> C++11 standard does have acq/rel/acq_rel/SC fences (expressed in
> terms of 'synchronizes with', see [atomics.fences]), no?
> regards,
> alexander.
> P.S. I don't like C++11 atomics, I think that atomic loads and
> stores ought to support the following 'modes':
>   Whether load/store is competing (default) or not. Competing load
>   means that there might be concurrent store (to the same object).
>   Competing store means that there might be concurrent load or
>   store. Non-competing load/store can be performed non-atomically.
>   Whether competing load/store needs remote write atomicity (default
>   is no remote write atomicity). A remote-write-atomicity-yes load
>   triggers undefined behaivior in the case of concurrent remote-
>   write-atomicity-no store.
>   Whether load/store has specified reordering constraint (default
>   is no constraint specified) in terms of the following reordering
>   modes:
>     Whether preceding loads (in program order) can be reordered
>     across it (can by default).
>     Whether preceding stores (in program order) can be reordered
>     across it (can by default).
>     Whether subsequent loads (in program order) can be reordered
>     across it (can by default). For load, the set of constrained
>     subsequent loads can be limited to only dependant loads (aka
>     'consume' mode).
>     Whether subsequent stores (in program order) can be reordered
>     across it (can by default). For load, there is an implicit
>     reordering constraint regarding dependent stores (no need to
>     specify it).
>     A fence/barrier operation can be used to specify reordering
>     constraint using basically the same modes.
> Re C++11 MM, I'm still missing more fine-grained memory order
> labels such as in pseudo C++ example below.
> (I mean mo::noncompeting, mo::ssb/ssb_t (sink store barrier, a
> release not affecting preceding loads), slb/slb_t (a release not
> affecting preceding stores) below, and somesuch for relaxed acquire)
> // Introspection (for bool argument below) aside for a moment
> template<typename T, bool copy_ctor_or_dtor_can_mutate_object>
> class mutex_and_condvar_free_single_producer_single_consumer {
>   typedef isolated< aligned_storage< T > > ELEM;
>   size_t           m_size; // > 1
>   ELEM *           m_elem; // array of elements, init'ed by ctor
>   atomic< ELEM * > m_head; // initially == m_elem
>   atomic< ELEM * > m_tail; // initially == m_elem
>   ELEM * advance(ELEM * elem) const {
>     return (++elem < m_elem + m_size) ? elem : m_elem;
>   }
> public:
>   mutex_and_condvar_free_single_producer_single_consumer(); // ctor
>  ~mutex_and_condvar_free_single_producer_single_consumer(); // dtor
>   void producer(const T & value) {
>     ELEM * tail = m_tail.load(mo::noncompeting); // may be nonatomic
>     ELEM * next = advance(tail);
>     while (next == m_head.load(mo::relaxed)) usleep(1000);
>     new(tail) T(value); // placement copy ctor (make queued copy)
>     m_tail.store(next, mo::ssb); // cheaper than mo::release
>   }
>   T consumer() {
>     ELEM * head = m_head.load(mo::noncompeting); // may be nonatomic
>     while (head == m_tail.load(mo::consume)) usleep(1000);
>     T value(*head); // T's copy ctor (make a copy to return)
>     head->~T(); // T's dtor (cleanup for queued copy)
>     m_head.store(advance(head), type_list< mo::slb_t, mo::rel_t >::
>       element<copy_ctor_or_dtor_can_mutate_object>::type());
>     return value; // return copied T
>   }
> };
> Note also that given that example above presumes that no more than
> one thread can read from relevant atomic locations while they are
> written concurrently, there is definitely no need to pay the price
> of remote write atomicity even if it is run on 3+ way
> multiprocessor... IOW, hwsync is unneeded even if all mo::* above
> are changed to SC... but C++11 MM doesn't allow to express
> no-need-for-remote-write-atomicity for SC atomics.
> "Boehm, Hans" <hans.boehm at hp.com>@cs.oswego.edu on 30.09.2011 22:25:27
> Sent by:    concurrency-interest-bounces at cs.oswego.edu
> To:    Ruslan Cheremin <cheremin at gmail.com>, Doug Lea
> <dl at cs.oswego.edu>
> cc:    "concurrency-interest at cs.oswego.edu"
>        <concurrency-interest at cs.oswego.edu>
> Subject:    Re: [concurrency-interest] AtomicXXX.lazySet and
>        happens-before   reasoning
> > c) Does it mean what next JMM release will throw out HB reasoning and
> > move to memory barriers/fences notation?
> >
> Adding to Doug's answer, the answer to this question is a resounding
> "no".
> Happens-before reasoning largely remains valid with lazySet, as you can
> see
> in the C++ model (which is unfortunately also significantly complicated
> by
> memory_order_consume, which currently has no Java analog, though it's
> closely related to some of the final field extension discussions).
> What I
> believe goes away is any meaningful notion of a single total
> synchronization order.
> I do not believe you ever want to reason in terms of fences.  Such
> reasoning is not usually sound for Java, since the memory model is
> carefully designed to allow elimination of synchronization on e.g. a
> volatile accessible form accessed by only a single thread.  Java
> volatiles
> etc. do not have fence semantics.  I do not know of any language-level
> memory models that have been successfully expressed in terms of fences.
> See http://dl.acm.org/citation.cfm?doid=1988915.1988919 for a
> discussion of
> why I don't consider a couple of better known attempts to be fully
> successful.
> Hans
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

More information about the Concurrency-interest mailing list