[concurrency-interest] Suggestion: .hardGet() for atomicvariables

Boehm, Hans hans.boehm at hp.com
Thu Jan 19 21:06:34 EST 2012


I'm not sure we're understanding each other quite correctly.  The problem is with read-only uses of seq_lock, using awaitAvailability and getSequence in Doug's interface.  The problem is that this effectively performs

Read sequence number (volatile read)
Read data (ordinary read for the questionable case; the problem goes away if we make it volatile, avoiding data races)
Read sequence number (volatile read)

This doesn't work (with the ordinary read in the middle) because, by roach motel rules, the middle read can be moved to the end, i.e. outside the critical section.  Thus we need to replace the final read by something else that disallows the roach motel movement.  Either getAndAdd or a CAS loop works.  A single failing CAS (probably, officially) does not.

I'm not sure that any of this bothers me.  This (i.e. without the volatile middle read) needs to remain at most an obscure hack because of the weird and unavoidable restrictions as to what you can put inside the "read-only critical section".  And GetAndAdd(0) seems to have exactly the right semantics to make the obscure and dangerous hack work.  Real fences are quite hard to define correctly, as we've seen in C++.  The fact that anybody who really wants to implement this obscure and dangerous hack already has an equally obscure construct to do it with seems almost satisfying, actually.

None of which denies that there are other problems in this area that need solving.

Hans

From: Vitaly Davidovich [mailto:vitalyd at gmail.com]
Sent: Thursday, January 19, 2012 5:23 PM
To: Boehm, Hans
Cc: Ruslan Cheremin; dholmes at ieee.org; concurrency-interest at cs.oswego.edu
Subject: RE: [concurrency-interest] Suggestion: .hardGet() for atomicvariables


Also, having to do getAndAdd(0) to simulate a fence is further evidence that a Fences API would be useful.  Doing that dummy CAS is a performance penalty for no good reason otherwise, it seems.

Sent from my phone
On Jan 19, 2012 8:15 PM, "Vitaly Davidovich" <vitalyd at gmail.com<mailto:vitalyd at gmail.com>> wrote:

OK maybe I'm looking at this too x86-centric but under which architectures would spinning on a seqlock not be enough? That is, why does the data you read inside the loop need to be volatile, meaning you can't assume that the spinning CAS won't provide the barrier? Intuitively, even though the CAS loop reads unrelated memory to the body of the loop, I think it's reasonable to assume that it provides an acquire fence, at least for a successful CAS.  Granted that's an assumption but I'm curious under what circumstances that wouldn't be the right thing to do.

Honestly, I'm really starting to think that the Fences API that Doug proposed is the right way to go, similar to C++11x atomics - this would allow explicit intent rather than mentally reconciling the JMM with actual hardware.

Vitaly

Sent from my phone
On Jan 19, 2012 8:00 PM, "Boehm, Hans" <hans.boehm at hp.com<mailto:hans.boehm at hp.com>> wrote:
Yes.   It does require volatile, as it should.

Getting back to the earlier topic, I'm not at all convinced that a failed CAS behaves like a volatile write according to the Java memory model.  There is no write that can synchronize with anything.  It probably does hold on X86.

By my reading, getAndAdd(0) is required to write, and hence does the right thing.  If an implementation optimizes it to a volatile load, it's wrong.  If it uses a CAS loop, that's fine, since the last one should succeed.

Hans

From: concurrency-interest-bounces at cs.oswego.edu<mailto:concurrency-interest-bounces at cs.oswego.edu> [mailto:concurrency-interest-bounces at cs.oswego.edu<mailto:concurrency-interest-bounces at cs.oswego.edu>] On Behalf Of David Holmes
Sent: Thursday, January 19, 2012 4:20 PM
To: Vitaly Davidovich; Ruslan Cheremin
Cc: concurrency-interest at cs.oswego.edu<mailto:concurrency-interest at cs.oswego.edu>
Subject: Re: [concurrency-interest] Suggestion: .hardGet() for atomicvariables

Vitaly,

Doug's SequenceLock requires that the data being read is volatile.

David
-------

* <p> Methods {@code awaitAvailability} and {@code getSequence} can

 * be used together to define (partially) optimistic read-only methods

 * that are usually more efficient than ReadWriteLocks when they

 * apply.  These methods should in general be structured as loops that

 * await lock availability, then read {@code volatile} fields into

 * local variables (and may further read other values derived from

 * these, for example the {@code length} of a {@code volatile} array),

 * and retry if the sequence number changed while doing so.
-----Original Message-----
From: concurrency-interest-bounces at cs.oswego.edu<mailto:concurrency-interest-bounces at cs.oswego.edu> [mailto:concurrency-interest-bounces at cs.oswego.edu<mailto:concurrency-interest-bounces at cs.oswego.edu>]On Behalf Of Vitaly Davidovich
Sent: Friday, 20 January 2012 7:42 AM
To: Ruslan Cheremin
Cc: concurrency-interest at cs.oswego.edu<mailto:concurrency-interest at cs.oswego.edu>
Subject: Re: [concurrency-interest] Suggestion: .hardGet() for atomicvariables

Failed cas won't write anything to cache (afterall you didn't modify anything).  However, it does achieve same memory fencing/ordering as a successful cas.  On x86/64 that's because the cmpxchg instruction is prefixed with LOCK, which by itself makes the instruction serializing irrespective of whether the cmpxchg succeeds.

Also, small addendum - processor doesn't always issue a RFO (request for ownership) before writing - if the cache line is in exclusive state in the writing processor, it doesn't need to do that.

Also Doug Lea has a version of seqlock in his CVS repo for jsr166e - you can take a look at it for details.  I'll tell you that there is no funny business there with dummy cas operations - he trusts volatile reads :).

Sent from my phone
On Jan 19, 2012 4:05 PM, "Ruslan Cheremin" <cheremin at gmail.com<mailto:cheremin at gmail.com>> wrote:
I think, it depends on what you name "write". Failed CAS will be "like
write" in sense of cache coherence traffic -- at least AFAIK -- it
will request cache line to be in M state (read-for-update), so if
cache line was in some other core's cache -- it will be invalidated.
But failed CAS does not really update cache line value, so it seems
like writeback to main memory not needed. I do not know, does current
CPUs actually optimize this writeback.

2012/1/20 Raph Frank <raphfrk at gmail.com<mailto:raphfrk at gmail.com>>:
> On Thu, Jan 19, 2012 at 8:48 PM, Ruslan Cheremin <cheremin at gmail.com<mailto:cheremin at gmail.com>> wrote:
>> current == value you've just read by .get() few lines ago.
>
> Ahh, right.
>
> For references, would .compareAndSet(null, null), also add in the syncing?
>
> Does a compare and set that fails to update count as a write, or just a read?
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120120/7a8e8bff/attachment.html>


More information about the Concurrency-interest mailing list