[concurrency-interest] JLS 17.7 Non-atomic treatment of double and long : Android

Vitaly Davidovich vitalyd at gmail.com
Tue Apr 30 13:05:03 EDT 2013


Right, writes would be atomic but not reads; read one half, another core
updates the value, read 2nd half from different value now.

As for SSE, yeah it's possible, but is that true? JIT skips integer
registers for scalar long operations? I find that hard to believe as it
would miss out on large register file/renaming opportunities.

Sent from my phone
On Apr 30, 2013 12:58 PM, "Nathan Reynolds" <nathan.reynolds at oracle.com>
wrote:

>  The processor can do whatever it wants in registers without other threads
> being able to see intermediate values.  Registers are private to the
> hardware thread.  So, we can use multiple instructions to load the ecx:ebx
> registers and then execute the cmpxchg8b to do a single write to globally
> visible cache.
>
> Nathan Reynolds<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds>| Architect |
> 602.333.9091
> Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
>  On 4/30/2013 9:53 AM, Vitaly Davidovich wrote:
>
> But this requires the src value to be in ecx:ebx so how would you load it
> there without two loads (and possibly observe tearing) in the first place?
>
> Sent from my phone
> On Apr 30, 2013 12:45 PM, "Nathan Reynolds" <nathan.reynolds at oracle.com>
> wrote:
>
>>  On 32-bit x86, the cmpxchg8b can be used to write a long in 1
>> instruction.  This instruction has been "present on most post-80486
>> processors" (Wikipedia).  There might be cheaper ways to write a long but
>> there is at least 1 way.
>>
>> Nathan Reynolds<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds>| Architect |
>> 602.333.9091
>> Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
>>  On 4/30/2013 9:37 AM, Vitaly Davidovich wrote:
>>
>> Curious how x86 would move a long in 1 instruction? There's no memory to
>> memory mov so has to go through register, and thus needs 2 registers (and
>> hence split).  Am I missing something?
>>
>> Sent from my phone
>> On Apr 30, 2013 12:23 PM, "Nathan Reynolds" <nathan.reynolds at oracle.com>
>> wrote:
>>
>>>  You might want to print the assembly using HotSpot (and OpenJDK?).  If
>>> the assembly, uses 1 instruction to do the write, then no splitting can
>>> ever happen (because alignment takes care of cache line splits).  If the
>>> assembly, uses 2 instructions to do the write, then it is only a matter of
>>> timing.
>>>
>>> With a single processor system, you are waiting for the thread's quantum
>>> to end right after the first instruction but before the second
>>> instruction.  This will allow the other thread to see the split write.
>>>
>>> With a dual processor system, the reader thread simply has to get a copy
>>> of the cache line after the first write and before the second write.  This
>>> is much easier to do.
>>>
>>> HotSpot will do a lot of optimizations on single processor systems.  For
>>> example, it gets rid of the "lock" prefix in front of atomic instructions
>>> since the instruction's execution can't be split.  It also doesn't output
>>> memory fences.  Both of these give good performance boosts.  I wonder if
>>> with one processor, OpenJDK is using 2 instructions to do the write whereas
>>> with multiple processors it plays it safe and uses 1 instruction.
>>>
>>> Note: If you disable all of the processors but 1 and then start HotSpot,
>>> HotSpot will start in single processor mode.  If you then enable those
>>> processors while HotSpot is running, a lot of things break and the JVM will
>>> crash.  Because single processor systems are rare, the default might be
>>> changed to assume multiple processors unless the command line specifies 1
>>> processor.
>>>
>>> Nathan Reynolds<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds>| Architect |
>>> 602.333.9091
>>> Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
>>>  On 4/30/2013 8:48 AM, Tim Halloran wrote:
>>>
>>>  Aleksey, correct -- more trials show what you predicted. Thanks for
>>> the nudge.
>>>
>>>  Mark,
>>>
>>>  Very helpful, in fact, we are seeing quick failures except for the
>>> dual-processor case -- on a dual processor hardware or VM (Virtual Box) we
>>> have yet to get a failure.  The two programs attached are what I'm running.
>>>  I stripped out my benchmark framework (so they are easy to run on OpenJDK
>>> but not on Android).  The difference is that one uses two threads (one
>>> writer one reader) the other three (two writers one reader) -- both seem to
>>> produce similar results.
>>>
>>>  With one processor, OpenJDK 1.6.0_27 I see the split write almost
>>> immediatly. Dual we can't get a failure, yet, we get more failures as the
>>> processor count goes up -- but after a few failures, we don't get any more
>>> (they program tries to get 10 to happen)...we can't get to 10.
>>>
>>>  It seems that while this can happen on OpenJDK it is rarer than on
>>> Android where ten failures takes less than a second to happen.
>>>
>>>  Best, Tim
>>>
>>>
>>>
>>> On Tue, Apr 30, 2013 at 11:26 AM, Mark Thornton <mthornton at optrak.com>wrote:
>>>
>>>>   On 30/04/13 15:36, Tim Halloran wrote:
>>>>
>>>> On Mon, Apr 29, 2013 at 4:59 PM, Aleksey Shipilev <
>>>> aleksey.shipilev at oracle.com> wrote:
>>>>
>>>>> Yes, that's exactly what I had in mind:
>>>>>  a. Declare "long a"
>>>>>  b. Ramp up two threads.
>>>>>  c. Make thread 1 write 0L and -1L over and over to field $a
>>>>>  d. Make thread 2 observe the field a, and count the observed values
>>>>>  e. ...
>>>>>  f. PROFIT!
>>>>>
>>>>> P.S. It is important to do some action on value read in thread 2, so
>>>>> that it does not hoisted from the loop, since $a is not supposed to be
>>>>> volatile.
>>>>>
>>>>> -Aleksey.
>>>>>
>>>>>
>>>>  This discussion is getting a bit far afield, I guess, but to get back
>>>> onto the topic. I followed Aleksey's advice. And wrote an implementation
>>>> that tests this.  I used two separate threads to write 0L and -1L into the
>>>> long field "a" but that is the only real change I made. (I already had some
>>>> scaffolding code to run things on Android or desktop Java).
>>>>
>>>>  *Android: splits writes to longs into two parts.*
>>>>
>>>>  On a Samsung Galaxy II with Android 4.0.4  a Nexus 4 phone with
>>>> Android 4.2.2 I saw non-atomic treatment of long. The value -4294967296
>>>> (xFFFFFFFF00000000) showed up as well as 4294967295 (x00000000FFFFFFFF).
>>>>
>>>>  So looks like Android does not follow the (albeit optional) advice in
>>>> the Java language specification about this.
>>>>
>>>>  *JDK: DOES NOT split writes to longs into two parts (even 32-bit
>>>> implementations)*
>>>>
>>>>  Of course we couldn't get this to happen on any 64-bit JVM, but we
>>>> tried it out under Linux on 32-bit OpenJDK 1.7.0_21 it does NOT happen. The
>>>> 32-bit JVM implementations follow the recommendation of the Java language
>>>> specification.
>>>>
>>>>  An interesting curio. I wonder how many crashes in "working" Java
>>>> code moved from desktop Java onto Android programmers are going to lose
>>>> sleep tracking down this one.
>>>>
>>>>
>>>>
>>>>  Last time I tried this sort of test, a split write would be observed
>>>> in under a second on a true dual processor. However, with only one
>>>> processor available, it would typically take around 20 minutes. So you
>>>> might have to run a very long test to have any real confidence in the lack
>>>> of splitting.
>>>>
>>>> Mark Thornton
>>>>
>>>>
>>>> _______________________________________________
>>>> Concurrency-interest mailing list
>>>> Concurrency-interest at cs.oswego.edu
>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Concurrency-interest mailing listConcurrency-interest at cs.oswego.eduhttp://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>>
>>>
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130430/2cb89b9f/attachment-0001.html>


More information about the Concurrency-interest mailing list