[concurrency-interest] JLS 17.7 Non-atomic treatment of double and long : Android

Stanimir Simeonoff stanimir at riflexo.com
Tue Apr 30 13:15:06 EDT 2013


.

> As for SSE, yeah it's possible, but is that true? JIT skips integer
> registers for scalar long operations? I find that hard to believe as it
> would miss out on large register file/renaming opportunities.
>
> I know that by looking at the assembly. I can still check w/ the current
version.

Stanimir



> Sent from my phone
> On Apr 30, 2013 12:58 PM, "Nathan Reynolds" <nathan.reynolds at oracle.com>
> wrote:
>
>>  The processor can do whatever it wants in registers without other
>> threads being able to see intermediate values.  Registers are private to
>> the hardware thread.  So, we can use multiple instructions to load the
>> ecx:ebx registers and then execute the cmpxchg8b to do a single write to
>> globally visible cache.
>>
>> Nathan Reynolds<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds>| Architect |
>> 602.333.9091
>> Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
>>  On 4/30/2013 9:53 AM, Vitaly Davidovich wrote:
>>
>> But this requires the src value to be in ecx:ebx so how would you load it
>> there without two loads (and possibly observe tearing) in the first place?
>>
>> Sent from my phone
>> On Apr 30, 2013 12:45 PM, "Nathan Reynolds" <nathan.reynolds at oracle.com>
>> wrote:
>>
>>>  On 32-bit x86, the cmpxchg8b can be used to write a long in 1
>>> instruction.  This instruction has been "present on most post-80486
>>> processors" (Wikipedia).  There might be cheaper ways to write a long but
>>> there is at least 1 way.
>>>
>>> Nathan Reynolds<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds>| Architect |
>>> 602.333.9091
>>> Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
>>>  On 4/30/2013 9:37 AM, Vitaly Davidovich wrote:
>>>
>>> Curious how x86 would move a long in 1 instruction? There's no memory to
>>> memory mov so has to go through register, and thus needs 2 registers (and
>>> hence split).  Am I missing something?
>>>
>>> Sent from my phone
>>> On Apr 30, 2013 12:23 PM, "Nathan Reynolds" <nathan.reynolds at oracle.com>
>>> wrote:
>>>
>>>>  You might want to print the assembly using HotSpot (and OpenJDK?).
>>>> If the assembly, uses 1 instruction to do the write, then no splitting can
>>>> ever happen (because alignment takes care of cache line splits).  If the
>>>> assembly, uses 2 instructions to do the write, then it is only a matter of
>>>> timing.
>>>>
>>>> With a single processor system, you are waiting for the thread's
>>>> quantum to end right after the first instruction but before the second
>>>> instruction.  This will allow the other thread to see the split write.
>>>>
>>>> With a dual processor system, the reader thread simply has to get a
>>>> copy of the cache line after the first write and before the second write.
>>>> This is much easier to do.
>>>>
>>>> HotSpot will do a lot of optimizations on single processor systems.
>>>> For example, it gets rid of the "lock" prefix in front of atomic
>>>> instructions since the instruction's execution can't be split.  It also
>>>> doesn't output memory fences.  Both of these give good performance boosts.
>>>> I wonder if with one processor, OpenJDK is using 2 instructions to do the
>>>> write whereas with multiple processors it plays it safe and uses 1
>>>> instruction.
>>>>
>>>> Note: If you disable all of the processors but 1 and then start
>>>> HotSpot, HotSpot will start in single processor mode.  If you then enable
>>>> those processors while HotSpot is running, a lot of things break and the
>>>> JVM will crash.  Because single processor systems are rare, the default
>>>> might be changed to assume multiple processors unless the command line
>>>> specifies 1 processor.
>>>>
>>>> Nathan Reynolds<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds>| Architect |
>>>> 602.333.9091
>>>> Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
>>>>  On 4/30/2013 8:48 AM, Tim Halloran wrote:
>>>>
>>>>  Aleksey, correct -- more trials show what you predicted. Thanks for
>>>> the nudge.
>>>>
>>>>  Mark,
>>>>
>>>>  Very helpful, in fact, we are seeing quick failures except for the
>>>> dual-processor case -- on a dual processor hardware or VM (Virtual Box) we
>>>> have yet to get a failure.  The two programs attached are what I'm running.
>>>>  I stripped out my benchmark framework (so they are easy to run on OpenJDK
>>>> but not on Android).  The difference is that one uses two threads (one
>>>> writer one reader) the other three (two writers one reader) -- both seem to
>>>> produce similar results.
>>>>
>>>>  With one processor, OpenJDK 1.6.0_27 I see the split write almost
>>>> immediatly. Dual we can't get a failure, yet, we get more failures as the
>>>> processor count goes up -- but after a few failures, we don't get any more
>>>> (they program tries to get 10 to happen)...we can't get to 10.
>>>>
>>>>  It seems that while this can happen on OpenJDK it is rarer than on
>>>> Android where ten failures takes less than a second to happen.
>>>>
>>>>  Best, Tim
>>>>
>>>>
>>>>
>>>> On Tue, Apr 30, 2013 at 11:26 AM, Mark Thornton <mthornton at optrak.com>wrote:
>>>>
>>>>>   On 30/04/13 15:36, Tim Halloran wrote:
>>>>>
>>>>> On Mon, Apr 29, 2013 at 4:59 PM, Aleksey Shipilev <
>>>>> aleksey.shipilev at oracle.com> wrote:
>>>>>
>>>>>> Yes, that's exactly what I had in mind:
>>>>>>  a. Declare "long a"
>>>>>>  b. Ramp up two threads.
>>>>>>  c. Make thread 1 write 0L and -1L over and over to field $a
>>>>>>  d. Make thread 2 observe the field a, and count the observed values
>>>>>>  e. ...
>>>>>>  f. PROFIT!
>>>>>>
>>>>>> P.S. It is important to do some action on value read in thread 2, so
>>>>>> that it does not hoisted from the loop, since $a is not supposed to be
>>>>>> volatile.
>>>>>>
>>>>>> -Aleksey.
>>>>>>
>>>>>>
>>>>>  This discussion is getting a bit far afield, I guess, but to get
>>>>> back onto the topic. I followed Aleksey's advice. And wrote an
>>>>> implementation that tests this.  I used two separate threads to write 0L
>>>>> and -1L into the long field "a" but that is the only real change I made. (I
>>>>> already had some scaffolding code to run things on Android or desktop Java).
>>>>>
>>>>>  *Android: splits writes to longs into two parts.*
>>>>>
>>>>>  On a Samsung Galaxy II with Android 4.0.4  a Nexus 4 phone with
>>>>> Android 4.2.2 I saw non-atomic treatment of long. The value -4294967296
>>>>> (xFFFFFFFF00000000) showed up as well as 4294967295 (x00000000FFFFFFFF).
>>>>>
>>>>>  So looks like Android does not follow the (albeit optional) advice
>>>>> in the Java language specification about this.
>>>>>
>>>>>  *JDK: DOES NOT split writes to longs into two parts (even 32-bit
>>>>> implementations)*
>>>>>
>>>>>  Of course we couldn't get this to happen on any 64-bit JVM, but we
>>>>> tried it out under Linux on 32-bit OpenJDK 1.7.0_21 it does NOT happen. The
>>>>> 32-bit JVM implementations follow the recommendation of the Java language
>>>>> specification.
>>>>>
>>>>>  An interesting curio. I wonder how many crashes in "working" Java
>>>>> code moved from desktop Java onto Android programmers are going to lose
>>>>> sleep tracking down this one.
>>>>>
>>>>>
>>>>>
>>>>>  Last time I tried this sort of test, a split write would be observed
>>>>> in under a second on a true dual processor. However, with only one
>>>>> processor available, it would typically take around 20 minutes. So you
>>>>> might have to run a very long test to have any real confidence in the lack
>>>>> of splitting.
>>>>>
>>>>> Mark Thornton
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Concurrency-interest mailing list
>>>>> Concurrency-interest at cs.oswego.edu
>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Concurrency-interest mailing listConcurrency-interest at cs.oswego.eduhttp://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Concurrency-interest mailing list
>>>> Concurrency-interest at cs.oswego.edu
>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>
>>>>
>>>
>>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130430/5e2d6c26/attachment-0001.html>


More information about the Concurrency-interest mailing list