[concurrency-interest] AtomicReferenceFieldUpdater vs Unsafe

Boehm, Hans hans.boehm at hp.com
Tue Nov 29 00:37:20 EST 2011

I'm primarily worried about hardware reordering.  On architectures like Itanium and PowerPC it takes extra fences to prevent that, and that can cost quite a few cycles.

I haven't looked at the j.u.c.l. implementation, and whether that's sometimes optimized to use JVM-specific facilities.  But I know of no JVM that implements the built-in monitorentry and monitorexit bytecodes with volatiles or atomics.    I haven't worked on any JVMs for a while, but I suspect that's still true.


From: Vitaly Davidovich [mailto:vitalyd at gmail.com]
Sent: Monday, November 28, 2011 9:03 PM
To: Boehm, Hans
Cc: Joe Bowbeer; concurrency-interest
Subject: Re: [concurrency-interest] AtomicReferenceFieldUpdater vs Unsafe


What's the benefit of moving a prior write inside a critical section (i.e. across a subsequent lock())? What's a use case where a compiler would elect to do that? I can see moving loads earlier (across a lock()) as possibly beneficial, but don't see a case for writes.  Or are you referring to hardware reordering here?

As for the CAS example on PowerPC (or Itanium, but that arch is dead :)), this would only be the case if the java code for the lock() was not using atomics/volatiles -- correct? Are you aware of a production JVM that implements the j.u.c.l locking primitives with something other than volatile/atomics? I get that you're concerned about the spec and we shouldn't rely on what current implementations do, but I'm just curious here.


On Mon, Nov 28, 2011 at 11:36 PM, Boehm, Hans <hans.boehm at hp.com<mailto:hans.boehm at hp.com>> wrote:
I see your point if the lock operation is implemented in terms of Java volatiles or atomics.  I don't think there is any guarantee that something like j.u.c.l.ReentrantLock is implemented that way.  Certainly that wasn't true for the basic MonitorEntry and Exit implementations for any of the JVMs that I have worked on in the past.  And there are architectures for which there is substantial benefit in using implementations that allow roach motel reordering across lock acquisitions.  This is true on Itanium, which provides a CAS instruction with only acquire semantics.  And I believe the same is true of the usual PowerPC lock acquisition sequence:  The CAS equivalent is unordered.  That is normally followed by a fence (typically isync, possibly lwsync) that prevents reordering of the lock acquisition with critical section operations.  But nothing prevents reordering of the CAS with prior operations.

I was originally responding to a thread arguing for a tryLock-like operation on built-in monitors.


From: concurrency-interest-bounces at cs.oswego.edu<mailto:concurrency-interest-bounces at cs.oswego.edu> [mailto:concurrency-interest-bounces at cs.oswego.edu<mailto:concurrency-interest-bounces at cs.oswego.edu>] On Behalf Of Vitaly Davidovich
Sent: Monday, November 28, 2011 8:02 PM
To: Joe Bowbeer
Cc: concurrency-interest

Subject: Re: [concurrency-interest] AtomicReferenceFieldUpdater vs Unsafe


If you forget about the notion of critical sections and such and just think about what code will be emitted here (even at the bytecode level), there's going to be a volatile write before the write to x; I can't see how the JIT (or interpreter, for that matter) can re-arrange the code in that case.

The assignment of x can move after the volatile read which will occur inside lock(), but it cannot reorder with the actual volatile store.  The compiler doesn't know what lock() actually is -- it just sees the bytecodes inside it (and whatever else it calls), so I don't see how things can be rearranged.

On Mon, Nov 28, 2011 at 10:39 PM, Joe Bowbeer <joe.bowbeer at gmail.com<mailto:joe.bowbeer at gmail.com>> wrote:
On Mon, Nov 28, 2011 at 7:08 PM, Vitaly Davidovich wrote:

Ok I see - you're going by purely what JMM dictates.  In practice though I can't imagine a lock() impl that won't issue a fence at some point internally before the write.  Therefore I don't see how assignment to x can move past it; the expanded lock would look something like:
- volatile load/acquire
- volatile write/cas/etc (assume lock succeeds, as in this example)
- x = 17

So I still don't understand how, in practice, x = 17 can move above the lock.

According to the "roach motel" semantics for critical sections, x=17 can move after the lock in Thread 1, into the "critical section", but it can never leave the critical section by moving past the unlock.  This is a transformation that the JIT compiler might perform.

Except in this case there is no unlock in Thread 1...

On Nov 28, 2011 9:57 PM, "David Holmes" wrote:
The HB edge only arises between an unlock and subsequent lock. Here there is no unlock  in T1 and hence no subsequent lock in T2 and so no HB. (Which is not to say that the necessary memory barriers have not been issued as a side-effect of other things - but there is no HB edge as per the JMM).

lock() has acquire semantics as does volatile read, hence regular load/stores can move past them. So here:

x = 17;

can become

x = 17;

which in this case just makes it more likely that T2 would in practice see 17.

-----Original Message-----
From: Vitaly Davidovich
Sent: Tuesday, 29 November 2011 12:49 PM
To: David Holmes

Cc: Boehm, Hans; concurrency-interest at cs.oswego.edu<mailto:concurrency-interest at cs.oswego.edu>; Doug Lea
Subject: RE: [concurrency-interest] AtomicReferenceFieldUpdater vs Unsafe

Yes I realized that I misread the snippet right after I sent the email - thanks though :).

However doing a tryLock will entail reading a volatile that was set during lock, so shouldn't that provide the HB? Granted I'm talking about a specific impl in the JDK.

I don't see how roach motel applies here since the lock() writes a volatile and prior stores can't reorder with it (subsequent ones can move in though).
On Nov 28, 2011 9:39 PM, "David Holmes" wrote:

Thread 2 only reads x if the lock is not available, which means thread 1 must have locked it, which occurs after setting x=17.

However in terms of happens-before, there is no HB edge here so no guarantee of x being 17 AFAICS. Thread 2 has to acquire the lock after thread 1 to get a HB edge that guarantees x == 17. Further, roach-motel tells us that x=17 could be reordered with the lock() anyway.

-----Original Message-----
From: concurrency-interest-bounces at cs.oswego.edu<mailto:concurrency-interest-bounces at cs.oswego.edu> [mailto:concurrency-interest-bounces at cs.oswego.edu<mailto:concurrency-interest-bounces at cs.oswego.edu>]On Behalf Of Vitaly Davidovich
Sent: Tuesday, 29 November 2011 12:30 PM
To: Boehm, Hans
Cc: Doug Lea; concurrency-interest at cs.oswego.edu<mailto:concurrency-interest at cs.oswego.edu>
Subject: Re: [concurrency-interest] AtomicReferenceFieldUpdater vs Unsafe

Hi Hans,

In your example why do you say that x should be 17 (or rather, why would someone assume that)? If thread one writes to x but gets descheduled before locking, then clearly thread 2 is not guaranteed to read 17; how is this different from if instead of using a lock, a volatile y was used?

If thread 1 did lock before tryLock in thread 2 then the store to a volatile inside lock and a read of that volatile in tryLock should be sufficient to create the happens-before edge.

I must be missing your point though.

On Nov 28, 2011 8:23 PM, "Boehm, Hans" wrote:
> From: Doug Lea
> On 11/28/11 14:23, Boehm, Hans wrote:
> > There's another problem with tryLock()-like methods, which I pointed
> out in a
> > 2007 PPoPP paper.  I think the current j.u.c.tryLock() is not
> correctly
> > specified.  The problem is illustrated by the following badly
> designed code:
> >
> > Thread 1: x = 17; l.lock();
> >
> > Thread 2: while (l.tryLock()) l.unlock(); ... x ...  // x should be
> 17 here!
> >
> >
> > A solution, more or less adopted by C++11, is to specify tryLock() as
> > allowing spurious failures,
> I think we are OK on this. The Lock spec defines tryLock in terms
> of the lock being "available", which means different things
> across different Lock implementations, and doesn't rule out
> spurious false returns.
That interpretation sounds good to me.  It does mean that a lock that was constructed before the tryLock call and has never been accessed by anything else might not be "available".


Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>

617-548-7007<tel:617-548-7007> (mobile)

617-548-7007<tel:617-548-7007> (mobile)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20111129/4e412656/attachment-0001.html>

More information about the Concurrency-interest mailing list