[concurrency-interest] Suggestion: .hardGet() for atomicvariables

David Holmes davidcholmes at aapt.net.au
Thu Jan 19 23:20:05 EST 2012


In general there need not be any specific ordering constraints relating to
the "CAS", hence the j.u.c requirement that they act as volatile read+write.
x86 with its "lock: cmpxchg" is probably the minority case here. SPARC under
TSO is similar to x86 in this regard. But load-linked/store-conditional
based primitives, such as on ARM and PPC, don't impose special ordering
constraints between the ll/sc actions and other loads and stores - hence the
need for memory synchronization instructions (like DMB on ARM) to insert
necessary "fences".

See the JSR-133 Cookbook and the table showing whether atomic instructions
include barriers:

http://g.oswego.edu/dl/jmm/cookbook.html

David
  -----Original Message-----
  From: Vitaly Davidovich [mailto:vitalyd at gmail.com]
  Sent: Friday, 20 January 2012 1:50 PM
  To: Boehm, Hans
  Cc: concurrency-interest at cs.oswego.edu; Ruslan Cheremin; dholmes at ieee.org
  Subject: RE: [concurrency-interest] Suggestion: .hardGet() for
atomicvariables


  Hans,

  I agree on your point and example of a volatile read not being sufficient
and allowing for movement of non volatile read after the 2nd volatile load.
However, my comments were geared towards the CAS loop that was mentioned.
What architectures provide different ordering for successful vs unsuccessful
CAS operations? I'm legitimately interested as I can't see a hardware vendor
that would pick such an option to implement but I'd love to hear of one.

  Additionally, David mentioned that CAS is specified as providing the
ordering without mentioning failure case so it seems like it should work
with the CAS loop we drew from Doug's javadoc.

  Sent from my phone

  On Jan 19, 2012 9:08 PM, "Boehm, Hans" <hans.boehm at hp.com> wrote:

    I’m not sure we’re understanding each other quite correctly.  The
problem is with read-only uses of seq_lock, using awaitAvailability and
getSequence in Doug’s interface.  The problem is that this effectively
performs



    Read sequence number (volatile read)

    Read data (ordinary read for the questionable case; the problem goes
away if we make it volatile, avoiding data races)

    Read sequence number (volatile read)



    This doesn’t work (with the ordinary read in the middle) because, by
roach motel rules, the middle read can be moved to the end, i.e. outside the
critical section.  Thus we need to replace the final read by something else
that disallows the roach motel movement.  Either getAndAdd or a CAS loop
works.  A single failing CAS (probably, officially) does not.



    I’m not sure that any of this bothers me.  This (i.e. without the
volatile middle read) needs to remain at most an obscure hack because of the
weird and unavoidable restrictions as to what you can put inside the
“read-only critical section”.  And GetAndAdd(0) seems to have exactly the
right semantics to make the obscure and dangerous hack work.  Real fences
are quite hard to define correctly, as we’ve seen in C++.  The fact that
anybody who really wants to implement this obscure and dangerous hack
already has an equally obscure construct to do it with seems almost
satisfying, actually.



    None of which denies that there are other problems in this area that
need solving.



    Hans



    From: Vitaly Davidovich [mailto:vitalyd at gmail.com]
    Sent: Thursday, January 19, 2012 5:23 PM
    To: Boehm, Hans
    Cc: Ruslan Cheremin; dholmes at ieee.org;
concurrency-interest at cs.oswego.edu
    Subject: RE: [concurrency-interest] Suggestion: .hardGet() for
atomicvariables



    Also, having to do getAndAdd(0) to simulate a fence is further evidence
that a Fences API would be useful.  Doing that dummy CAS is a performance
penalty for no good reason otherwise, it seems.

    Sent from my phone

    On Jan 19, 2012 8:15 PM, "Vitaly Davidovich" <vitalyd at gmail.com> wrote:

    OK maybe I'm looking at this too x86-centric but under which
architectures would spinning on a seqlock not be enough? That is, why does
the data you read inside the loop need to be volatile, meaning you can't
assume that the spinning CAS won't provide the barrier? Intuitively, even
though the CAS loop reads unrelated memory to the body of the loop, I think
it's reasonable to assume that it provides an acquire fence, at least for a
successful CAS.  Granted that's an assumption but I'm curious under what
circumstances that wouldn't be the right thing to do.

    Honestly, I'm really starting to think that the Fences API that Doug
proposed is the right way to go, similar to C++11x atomics - this would
allow explicit intent rather than mentally reconciling the JMM with actual
hardware.

    Vitaly

    Sent from my phone

    On Jan 19, 2012 8:00 PM, "Boehm, Hans" <hans.boehm at hp.com> wrote:

    Yes.   It does require volatile, as it should.



    Getting back to the earlier topic, I’m not at all convinced that a
failed CAS behaves like a volatile write according to the Java memory model.
There is no write that can synchronize with anything.  It probably does hold
on X86.



    By my reading, getAndAdd(0) is required to write, and hence does the
right thing.  If an implementation optimizes it to a volatile load, it’s
wrong.  If it uses a CAS loop, that’s fine, since the last one should
succeed.



    Hans



    From: concurrency-interest-bounces at cs.oswego.edu
[mailto:concurrency-interest-bounces at cs.oswego.edu] On Behalf Of David
Holmes
    Sent: Thursday, January 19, 2012 4:20 PM
    To: Vitaly Davidovich; Ruslan Cheremin
    Cc: concurrency-interest at cs.oswego.edu
    Subject: Re: [concurrency-interest] Suggestion: .hardGet() for
atomicvariables



    Vitaly,



    Doug's SequenceLock requires that the data being read is volatile.



    David

    -------

* <p> Methods {@code awaitAvailability} and {@code getSequence} can * be
used together to define (partially) optimistic read-only methods * that are
usually more efficient than ReadWriteLocks when they * apply.  These methods
should in general be structured as loops that * await lock availability,
then read {@code volatile} fields into * local variables (and may further
read other values derived from * these, for example the {@code length} of a
{@code volatile} array), * and retry if the sequence number changed while
doing so.-----Original Message-----
      From: concurrency-interest-bounces at cs.oswego.edu
[mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Vitaly
Davidovich
      Sent: Friday, 20 January 2012 7:42 AM
      To: Ruslan Cheremin
      Cc: concurrency-interest at cs.oswego.edu
      Subject: Re: [concurrency-interest] Suggestion: .hardGet() for
atomicvariables

      Failed cas won't write anything to cache (afterall you didn't modify
anything).  However, it does achieve same memory fencing/ordering as a
successful cas.  On x86/64 that's because the cmpxchg instruction is
prefixed with LOCK, which by itself makes the instruction serializing
irrespective of whether the cmpxchg succeeds.

      Also, small addendum - processor doesn't always issue a RFO (request
for ownership) before writing - if the cache line is in exclusive state in
the writing processor, it doesn't need to do that.

      Also Doug Lea has a version of seqlock in his CVS repo for jsr166e -
you can take a look at it for details.  I'll tell you that there is no funny
business there with dummy cas operations - he trusts volatile reads :).

      Sent from my phone

      On Jan 19, 2012 4:05 PM, "Ruslan Cheremin" <cheremin at gmail.com> wrote:

      I think, it depends on what you name "write". Failed CAS will be "like
      write" in sense of cache coherence traffic -- at least AFAIK -- it
      will request cache line to be in M state (read-for-update), so if
      cache line was in some other core's cache -- it will be invalidated.
      But failed CAS does not really update cache line value, so it seems
      like writeback to main memory not needed. I do not know, does current
      CPUs actually optimize this writeback.

      2012/1/20 Raph Frank <raphfrk at gmail.com>:
      > On Thu, Jan 19, 2012 at 8:48 PM, Ruslan Cheremin
<cheremin at gmail.com> wrote:
      >> current == value you've just read by .get() few lines ago.
      >
      > Ahh, right.
      >
      > For references, would .compareAndSet(null, null), also add in the
syncing?
      >
      > Does a compare and set that fails to update count as a write, or
just a read?
      > _______________________________________________
      > Concurrency-interest mailing list
      > Concurrency-interest at cs.oswego.edu
      > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
      _______________________________________________
      Concurrency-interest mailing list
      Concurrency-interest at cs.oswego.edu
      http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120120/246c5342/attachment-0001.html>


More information about the Concurrency-interest mailing list