[concurrency-interest] Stricter read ordering

Roman Elizarov elizarov at devexperts.com
Thu Apr 24 12:00:29 EDT 2014


Also, it would be an interesting toy project to figure out which version (CAS or two volatile read/write ops in proper order) works faster in practise on different CPUs. Neither is satisfactory in practise anyway. Too bad we have direct access to fences only in Java 8 and still only via unsafe.

For my project, I would take the same approach that is used in StampedLock backport. I need CAS on write for lock anyway and I'll have to believe that Unsafe.getXXXVolatile provides the neccessary barrier in practise (I have not looked yet at its intinsic source to see why that should be true, though). I need a lock-free atomic snapshot (cannot spin forever while writer is suspended in the middle of write), so it has a writer-helping-reader path, too, but the core logic is still StampedLock-like.

On 24 апр. 2014 г., at 16:32, "Oleksandr Otenko" <oleksandr.otenko at oracle.com<mailto:oleksandr.otenko at oracle.com>> wrote:

Yes, I need to retract my statement about CAS. I checked the CAS thread again. It does say that "just" a pair of load/store is not adequate for Java. So, then the CAS-based version must be OK. I would still like to point out that my version doesn't rely on underspecified semantics of CAS.

Alex

On 24/04/2014 09:52, Roman Elizarov wrote:
Below is a solution to the original problem posted by Tobias using Java 8 StamedLock. I’ve expanded the actual implementation of StampedLock’s operations in comments for the ease of reference in the further discussion:

class StampedData {
    // THE NEXT LINE ESSENTIALLY CONTAINS long state
    private final StampedLock lock = new StampedLock();

    // this is the data I protect, in reality there is much more protected state
    private int x, y;

    public synchronized void update(int x, int y) {
        // I guarantee single writers to this method,
        // illustrated with 'synchronized' in this simplification
        // first acquire write lock
        // THE NEXT LINE ESSENTIALLY EXPANDS TO: stamp = state; state.CAS(stamp, stamp += WBIT);
        long stamp = lock.tryWriteLock(); // always succeeds in single-writer case
        // then write protected data (non-volatile writes)
        this.x = x;
        this.y = y;
        // then release write lock
        // THE NEXT LINE ESSENTIALLY EXPANDS TO: state = stamp + WBIT;
        lock.unlockWrite(stamp);
    }

    public DataCarrier read() {
        // I allow multiple readers, so this method is not synchronized
        int x, y;
        int v0;
        while (true) {
            // first try optimistic read
            // THE NEXT LINE ESSENTIALLY EXPANDS TO: stamp = state; // if was not locked by writer
            long stamp = lock.tryOptimisticRead();
            if (stamp == 0)
                continue; // immediately abort, because write in progress was detected
            // then read protected data
            x = this.x;
            y = this.y;
            // validate to make sure that the read was consistent
            // THE NEXT LINE ESSENTIALLY EXPANDS TO: loadFence; if (state == stamp) break;
            if (lock.validate(stamp))
                break;
        }
        return new DataCarrier(x, y);
    }
}

As you can see, it is virtually the same code as I’ve presented in my original solution where “version” is replaced by StampedLock’s “state”. The primary difference can be found in the read path, when StampedLock uses loadFence to ensure LoadLoad barrier after the read of the protected data and before checking that state did not change. There is no Unsafe.loadFence before Java 8, so my solution had to resort to what Hans Boehm named “read-dont-modify-write” operation using CAS.

Now, let us take a closer look at the write path in update method. You can see that StampedLock’s write path is the same as the write path that was used in my original solution (version is updated with CAS before writing protected data and updated again with volatile write after it). So, what happens if this CAS is not atomic from the standpoint of the memory model, but is modelled as two separate operations, the first being volatile read, the second one being volatile write?

As you’ve correctly observed, in this model non-volatile writes to the protected state can be reordered in between the volatile read and volatile write that constitute CAS. If this happens, reader can read an inconsistent snapshot of the protected state without detecting this fact (validate will return true), if all reader’s actions take effect in between those volatile read and volatile write that constitute CAS. Q.E.D. From the memory model perspective (in JLS Chapter 17 terms) CAS has to be represented as one “inter-thread action” that has effect of both volatile read and volatile write. Otherwise, StampedLock idiom is broken.

From: Oleksandr Otenko [mailto:oleksandr.otenko at oracle.com]
Sent: Thursday, April 24, 2014 1:20 AM
To: Roman Elizarov
Cc: concurrency-interest at cs.oswego.edu<mailto:concurrency-interest at cs.oswego.edu>
Subject: Re: [concurrency-interest] Stricter read ordering

Can you quote the exact spot? I see it is quite different from your suggestion, and where I look it doesn't seem atomicity of CAS matters.


Alex

On 23/04/2014 20:57, Roman Elizarov wrote:
You are right. You cannot just split CAS into load and store this way. Actually, if you analyze required barriers for write path in the same way as I did for read path (remember, that writer has to have StoreStore barrier before version write and subsequent non-volatile writes), you see that you need to insert volatile read _after_ the version.write() in the write path, so it becomes:

Writer does (in PO): version.volatileWrite();  {StoreLoad emitted}; something.volatileRead(); {LoadStore emitted} x.write(); y.write(); {StoreStore emitted}; version.write();

This effectively gives you StoreStore barrier between version write and a first non-volatile write of protected data state. Ordering those two volatile operations at the beginning of write path the other way around, as in your example below, does not provide the necessary barrier to guarantee correctness of this idiom.

CAS has to do both volatile load and store in atomic and indivisible way, so that you cannot reorder subsequent normal write into the “middle of” CAS. This is not explicitly stated in the CAS spec as I can see (the wording about “atomic” behavior of CAS and about its load/store semantics are rather far apart, so it is debatable), but this is the property of CAS that Java 8’s implementation of StampedLock relies upon, too.

From: concurrency-interest-bounces at cs.oswego.edu<mailto:concurrency-interest-bounces at cs.oswego.edu> [mailto:concurrency-interest-bounces at cs.oswego.edu] On Behalf Of Oleksandr Otenko
Sent: Wednesday, April 23, 2014 11:12 PM
Cc: concurrency-interest at cs.oswego.edu<mailto:concurrency-interest at cs.oswego.edu>
Subject: Re: [concurrency-interest] Stricter read ordering

No. If CAS is implemented as a volatile load followed by a volatile store, the effect I describe is possible. There is no need for a weaker implementation for CAS.

Watch:

replace version.compareAndSet with get and set:

v0=version.get();
version.set(v0 | WRITE);
this.x=x; // normal write can go ahead of the volatile write, as per the cookbook

Or, in JMM terms, there is no edge enforcing this.x=x by the writer to appear strictly after x=this.x by the reader.

Alex
On 23/04/2014 19:34, Roman Elizarov wrote:
There was no light in that discussion. CAS is currently spec'd as volatile load and volatile store.  Period. If someone implements CAS on ARM with weaker guarantees that violate this spec, then StampedLock in Java 8 solution also becomes broken for the same reason you've outlined below.

On 23 апр. 2014 г., at 21:42, "Oleksandr Otenko" <oleksandr.otenko at oracle.com<mailto:oleksandr.otenko at oracle.com>> wrote:
In light of the recent discussion of the semantics of CAS, this solution is not correct.

The outcome of that discussion is that the atomicity of CAS is only guaranteed with respect to the variable being CASed. This does not matter on x86, but on ARM, as I understand, the CAS will be implemented as two instructions that only preclude concurrent stores to the variable being CASed - which does not preclude reordering the normal stores following CAS with the "volatile store" part of CAS.

So, here:




        version.compareAndSet(v0, v0 | WRITE); // always succeeds in single-writer case, ensures proper HB edges for JMM

        // then write protected data (non-volatile writes)

        this.x = x;

        this.y = y;
this.x=x; may be observed before compareAndSet completes.

The correct solution should produce:

writer:
vwrite(v,v+1)
vread(s,_)
write(x,_)
write(y,_)
vwrite(v,v+1)

reader:
vread(v,_)
read(x,_)
read(y,_)
vwrite(s,_)
vread(v,_)

For example:

writer:
// synchronized
long next = this.version;
this.version=next+1; // odd version means mutation underway
long tmp=static_volatile_dummy;
this.x=x;
this.y=y;
this.version=next+2; // even version means immutable

reader:
long v;
do{
  while((v=this.version) &1 != 0); // wait for version to become even
  x=this.x;
  y=this.y;
  static_volatile_dummy=v;
}while(this.version != v); // check the version didn't change


Proof:

It is not necessary to consider all possible reorderings individually. It is sufficient to consider relative ordering of volatile reads and writes.

All writers ensure exclusive access to this.x and this.y, so we only need to consider one writer performing modifications in a loop.

If volatile read of static_volatile_dummy appears after a volatile write of static_volatile_dummy in synchronization order, the former synchronizes-with the latter, and the normal reads of this.x and this.y by the reader happen-before the normal writes of those variables by the writer (transitive closure of program orders). So, the readers never see normal writes of those writers.

Now consider the writers whose read of static_volatile_dummy does not synchronize-with the writes. The reader may temporarily observe some values, but will only terminate the loop after observing the values of this.x and this.y whose write appear before the volatile write of a even version: the first loop ensures we only look at writes preceding a write of a even version, the condition of the outer loop ensures no other volatile writes to version appear in synchronization order. This means that the first loop synchronizes-with the last write of a even version, hence, through transitive closure of program orders the normal writes of this.x and this.y happen-before the normal reads. The outer condition ensures that since no other volatile writes of version appear in synchronization order, the volatile read of static_volatile_dummy after all such writes will synchronize-with the volatile write of static_volatile_dummy of the reader (and their normal writes won't be observed, as per the proof in the previous paragraph).


This concludes the proof that the reader always observes the consistent view of this.x and this.y.



Alex


On 23/04/2014 15:16, Roman Elizarov wrote:

The original problem itself is solvable, albeit with a different algorithm. It's funny, that we've discussed the same problem today internally and then I see this thread on concurrency-interest list.



Let me state the problem first. There is a multi-word state that we have to read and write in non-volatile way (if reads and writes of our state can be made volatile, then solution is trivial and is not of a much interest). We need to implement single-writer multi-reader atomic snapshot of this multi-word state with the primitives described in and under the constraints of the Java Memory Model (think Java 5+). The read does not have to be lock-free. It just needs to detect that snapshot being read is not consistent. In original question author was Ok for reader to spin in case of the read/write conflict, but we actually have a different problem where it is Ok for reader to abandon the read attempt altogether. The key is to detect this conflict or to return an atomic snapshot of the protected state if there is no read/write conflict.



So, here is a solution. It can be proven to work on current version of JMM (starting Java 5 and later) in the JMM's model, by analyzing possible executions and their corresponding SO, SW, and HB relations. It requires just one extra int of state (instead of two), but this int needs to be CAS-ed both on read and on write to ensure proper happens-before edges and to guarantee the consistent read of non-volatile state. The proof of this algorithm's correctness is left as an exercise for the reader. It will not scale, though, if there are many concurrent readers, because of the CAS in read path.



class VersionedData {

    // bit that we'll use to indicate that the state is being written to

    private static final int WRITE = 1 << 31;

    // we need to CAS version (in practise we'll do it via Unsafe to avoid extra object)

    private final AtomicInteger version = new AtomicInteger();



    // this is the data I protect, in reality there is much more protected state

    private int x, y;



    public synchronized void update(int x, int y) {

        // I guarantee single writers to this method,

        // illustrated with 'synchronized' in this simplification

        // first use CAS to mark version as being written to

        int v0 = version.get(); // can be non-volatile read, but then the following CAS can fail and needs to retry

        version.compareAndSet(v0, v0 | WRITE); // always succeeds in single-writer case, ensures proper HB edges for JMM

        // then write protected data (non-volatile writes)

        this.x = x;

        this.y = y;

        // then increment version and reset write bit

        version.set((v0 + 1) & ~WRITE);

    }



    public DataCarrier read() {

        // I allow multiple readers, so this method is not synchronized

        int x, y;

        int v0;

        do {

            // first read version

            v0 = version.get();

            if ((v0 & WRITE) != 0)

                continue; // immediately abort, because write in progress was detected

            // then read protected data

            x = this.x;

            y = this.y;

            // use CAS to check that version is still the same and to ensure proper HB edges for JMM at the same time

        } while (!version.compareAndSet(v0, v0));

        return new DataCarrier(x, y);

    }

}



-----Original Message-----

From: concurrency-interest-bounces at cs.oswego.edu<mailto:concurrency-interest-bounces at cs.oswego.edu> [mailto:concurrency-interest-bounces at cs.oswego.edu] On Behalf Of Aleksey Shipilev

Sent: Wednesday, April 23, 2014 5:36 PM

To: Tobias Lindaaker; concurrency-interest at cs.oswego.edu<mailto:concurrency-interest at cs.oswego.edu>

Subject: Re: [concurrency-interest] Stricter read ordering



On 04/23/2014 05:05 PM, Tobias Lindaaker wrote:

Yes, I had a look at StampedLock and Unsafe.loadFence(), and it seems

to do exactly what I want, and if I was fortunate enough to be able to

move to Java 8 I would use it. Unfortunately we are still stuck on

Java 7. We even have customers who are still strongly requesting Java

6 compatibility.

These constraints make the problem unresolvable.



You might want to look for pre-JDK8 prototype for StampedLock [1]:



     * As noted in Boehm's paper (above), sequence validation (mainly

     * method validate()) requires stricter ordering rules than apply

     * to normal volatile reads (of "state").  In the absence of (but

     * continual hope for) explicit JVM support of intrinsics with

     * double-sided reordering prohibition, or corresponding fence

     * intrinsics, we for now uncomfortably rely on the fact that the

     * Unsafe.getXVolatile intrinsic must have this property

     * (syntactic volatile reads do not) for internal purposes anyway,

     * even though it is not documented.



    public boolean validate(long stamp) {

        return (stamp & SBITS) == (U.getLongVolatile(this, STATE) & SBITS);

    }



But if Unsafe.loadFence() is risky since it is not specified (yet) JMM-wise, and so interactions with other volatile ops and fences is just undocumented... then using Unsafe.getXVolatile is double-risky because the behavioral effect of read ordering is *REALLY* implementation-specific, and you if are using it for read ordering, you are five miles past the gateway to Hell already.



Thanks,

-Aleksey.



[1]

http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/StampedLock.java?revision=1.1

_______________________________________________

Concurrency-interest mailing list

Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>

http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________

Concurrency-interest mailing list

Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>

http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20140424/0603d3f9/attachment-0001.html>


More information about the Concurrency-interest mailing list