[concurrency-interest] AtomicXXX.lazySet and happens-before reasoning

Alexander Terekhov TEREKHOV at de.ibm.com
Sat Oct 1 07:10:08 EDT 2011

"Boehm, Hans" <hans.boehm at hp.com> wrote:
> C++ model . . . I do not believe you ever want to reason in terms of
> fences.

For programs targeted to run on POWER/PPC and ARM hardware, reasoning
and coding in terms of fences is the best and that is why the latest
C++11 standard does have acq/rel/acq_rel/SC fences (expressed in
terms of 'synchronizes with', see [atomics.fences]), no?


P.S. I don't like C++11 atomics, I think that atomic loads and
stores ought to support the following 'modes':

  Whether load/store is competing (default) or not. Competing load
  means that there might be concurrent store (to the same object).
  Competing store means that there might be concurrent load or
  store. Non-competing load/store can be performed non-atomically.

  Whether competing load/store needs remote write atomicity (default
  is no remote write atomicity). A remote-write-atomicity-yes load
  triggers undefined behaivior in the case of concurrent remote-
  write-atomicity-no store.

  Whether load/store has specified reordering constraint (default
  is no constraint specified) in terms of the following reordering

    Whether preceding loads (in program order) can be reordered
    across it (can by default).

    Whether preceding stores (in program order) can be reordered
    across it (can by default).

    Whether subsequent loads (in program order) can be reordered
    across it (can by default). For load, the set of constrained
    subsequent loads can be limited to only dependant loads (aka
    'consume' mode).

    Whether subsequent stores (in program order) can be reordered
    across it (can by default). For load, there is an implicit
    reordering constraint regarding dependent stores (no need to
    specify it).

    A fence/barrier operation can be used to specify reordering
    constraint using basically the same modes.

Re C++11 MM, I'm still missing more fine-grained memory order
labels such as in pseudo C++ example below.

(I mean mo::noncompeting, mo::ssb/ssb_t (sink store barrier, a
release not affecting preceding loads), slb/slb_t (a release not
affecting preceding stores) below, and somesuch for relaxed acquire)

// Introspection (for bool argument below) aside for a moment
template<typename T, bool copy_ctor_or_dtor_can_mutate_object>
class mutex_and_condvar_free_single_producer_single_consumer {

  typedef isolated< aligned_storage< T > > ELEM;

  size_t           m_size; // > 1
  ELEM *           m_elem; // array of elements, init'ed by ctor
  atomic< ELEM * > m_head; // initially == m_elem
  atomic< ELEM * > m_tail; // initially == m_elem

  ELEM * advance(ELEM * elem) const {
    return (++elem < m_elem + m_size) ? elem : m_elem;


  mutex_and_condvar_free_single_producer_single_consumer(); // ctor
 ~mutex_and_condvar_free_single_producer_single_consumer(); // dtor

  void producer(const T & value) {
    ELEM * tail = m_tail.load(mo::noncompeting); // may be nonatomic
    ELEM * next = advance(tail);
    while (next == m_head.load(mo::relaxed)) usleep(1000);
    new(tail) T(value); // placement copy ctor (make queued copy)
    m_tail.store(next, mo::ssb); // cheaper than mo::release

  T consumer() {
    ELEM * head = m_head.load(mo::noncompeting); // may be nonatomic
    while (head == m_tail.load(mo::consume)) usleep(1000);
    T value(*head); // T's copy ctor (make a copy to return)
    head->~T(); // T's dtor (cleanup for queued copy)
    m_head.store(advance(head), type_list< mo::slb_t, mo::rel_t >::
    return value; // return copied T


Note also that given that example above presumes that no more than
one thread can read from relevant atomic locations while they are
written concurrently, there is definitely no need to pay the price
of remote write atomicity even if it is run on 3+ way
multiprocessor... IOW, hwsync is unneeded even if all mo::* above
are changed to SC... but C++11 MM doesn't allow to express
no-need-for-remote-write-atomicity for SC atomics.

"Boehm, Hans" <hans.boehm at hp.com>@cs.oswego.edu on 30.09.2011 22:25:27

Sent by:    concurrency-interest-bounces at cs.oswego.edu

To:    Ruslan Cheremin <cheremin at gmail.com>, Doug Lea <dl at cs.oswego.edu>
cc:    "concurrency-interest at cs.oswego.edu"
       <concurrency-interest at cs.oswego.edu>
Subject:    Re: [concurrency-interest] AtomicXXX.lazySet and
       happens-before   reasoning

> c) Does it mean what next JMM release will throw out HB reasoning and
> move to memory barriers/fences notation?
Adding to Doug's answer, the answer to this question is a resounding "no".
Happens-before reasoning largely remains valid with lazySet, as you can see
in the C++ model (which is unfortunately also significantly complicated by
memory_order_consume, which currently has no Java analog, though it's
closely related to some of the final field extension discussions).  What I
believe goes away is any meaningful notion of a single total
synchronization order.

I do not believe you ever want to reason in terms of fences.  Such
reasoning is not usually sound for Java, since the memory model is
carefully designed to allow elimination of synchronization on e.g. a
volatile accessible form accessed by only a single thread.  Java volatiles
etc. do not have fence semantics.  I do not know of any language-level
memory models that have been successfully expressed in terms of fences.
See http://dl.acm.org/citation.cfm?doid=1988915.1988919 for a discussion of
why I don't consider a couple of better known attempts to be fully


Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu

More information about the Concurrency-interest mailing list