[concurrency-interest] JLS 17.7 Non-atomic treatment of double and long : Android

Nathan Reynolds nathan.reynolds at oracle.com
Tue Apr 30 12:59:42 EDT 2013


Thanks for reminding me of SSE.  SSE would be much faster than cmpxchg8b.

Nathan Reynolds 
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> | 
Architect | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
On 4/30/2013 9:57 AM, Stanimir Simeonoff wrote:
>
>
> On Tue, Apr 30, 2013 at 7:37 PM, Vitaly Davidovich <vitalyd at gmail.com 
> <mailto:vitalyd at gmail.com>> wrote:
>
>     Curious how x86 would move a long in 1 instruction? There's no
>     memory to memory mov so has to go through register, and thus needs
>     2 registers (and hence split).  Am I missing something?
>
> It uses ab SSE instruction, they are wider.
>
> Stanimir
>
>     Sent from my phone
>
>     On Apr 30, 2013 12:23 PM, "Nathan Reynolds"
>     <nathan.reynolds at oracle.com <mailto:nathan.reynolds at oracle.com>>
>     wrote:
>
>         You might want to print the assembly using HotSpot (and
>         OpenJDK?).  If the assembly, uses 1 instruction to do the
>         write, then no splitting can ever happen (because alignment
>         takes care of cache line splits).  If the assembly, uses 2
>         instructions to do the write, then it is only a matter of timing.
>
>         With a single processor system, you are waiting for the
>         thread's quantum to end right after the first instruction but
>         before the second instruction.  This will allow the other
>         thread to see the split write.
>
>         With a dual processor system, the reader thread simply has to
>         get a copy of the cache line after the first write and before
>         the second write.  This is much easier to do.
>
>         HotSpot will do a lot of optimizations on single processor
>         systems.  For example, it gets rid of the "lock" prefix in
>         front of atomic instructions since the instruction's execution
>         can't be split. It also doesn't output memory fences.  Both of
>         these give good performance boosts.  I wonder if with one
>         processor, OpenJDK is using 2 instructions to do the write
>         whereas with multiple processors it plays it safe and uses 1
>         instruction.
>
>         Note: If you disable all of the processors but 1 and then
>         start HotSpot, HotSpot will start in single processor mode. 
>         If you then enable those processors while HotSpot is running,
>         a lot of things break and the JVM will crash.  Because single
>         processor systems are rare, the default might be changed to
>         assume multiple processors unless the command line specifies 1
>         processor.
>
>         Nathan Reynolds
>         <http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds>
>         | Architect | 602.333.9091 <tel:602.333.9091>
>         Oracle PSR Engineering <http://psr.us.oracle.com/> | Server
>         Technology
>         On 4/30/2013 8:48 AM, Tim Halloran wrote:
>>         Aleksey, correct -- more trials show what you predicted.
>>         Thanks for the nudge.
>>
>>         Mark,
>>
>>         Very helpful, in fact, we are seeing quick failures except
>>         for the dual-processor case -- on a dual processor hardware
>>         or VM (Virtual Box) we have yet to get a failure.  The two
>>         programs attached are what I'm running.  I stripped out my
>>         benchmark framework (so they are easy to run on OpenJDK but
>>         not on Android).  The difference is that one uses two threads
>>         (one writer one reader) the other three (two writers one
>>         reader) -- both seem to produce similar results.
>>
>>         With one processor, OpenJDK 1.6.0_27 I see the split write
>>         almost immediatly. Dual we can't get a failure, yet, we get
>>         more failures as the processor count goes up -- but after a
>>         few failures, we don't get any more (they program tries to
>>         get 10 to happen)...we can't get to 10.
>>
>>         It seems that while this can happen on OpenJDK it is rarer
>>         than on Android where ten failures takes less than a second
>>         to happen.
>>
>>         Best, Tim
>>
>>
>>
>>         On Tue, Apr 30, 2013 at 11:26 AM, Mark Thornton
>>         <mthornton at optrak.com <mailto:mthornton at optrak.com>> wrote:
>>
>>             On 30/04/13 15:36, Tim Halloran wrote:
>>>             On Mon, Apr 29, 2013 at 4:59 PM, Aleksey Shipilev
>>>             <aleksey.shipilev at oracle.com
>>>             <mailto:aleksey.shipilev at oracle.com>> wrote:
>>>
>>>                 Yes, that's exactly what I had in mind:
>>>                  a. Declare "long a"
>>>                  b. Ramp up two threads.
>>>                  c. Make thread 1 write 0L and -1L over and over to
>>>                 field $a
>>>                  d. Make thread 2 observe the field a, and count the
>>>                 observed values
>>>                  e. ...
>>>                  f. PROFIT!
>>>
>>>                 P.S. It is important to do some action on value read
>>>                 in thread 2, so
>>>                 that it does not hoisted from the loop, since $a is
>>>                 not supposed to be
>>>                 volatile.
>>>
>>>                 -Aleksey.
>>>
>>>
>>>             This discussion is getting a bit far afield, I guess,
>>>             but to get back onto the topic. I followed Aleksey's
>>>             advice. And wrote an implementation that tests this.  I
>>>             used two separate threads to write 0L and -1L into the
>>>             long field "a" but that is the only real change I made.
>>>             (I already had some scaffolding code to run things on
>>>             Android or desktop Java).
>>>
>>>             *Android: splits writes to longs into two parts.*
>>>
>>>             On a Samsung Galaxy II with Android 4.0.4  a Nexus 4
>>>             phone with Android 4.2.2 I saw non-atomic treatment of
>>>             long. The value -4294967296 (xFFFFFFFF00000000) showed
>>>             up as well as 4294967295 (x00000000FFFFFFFF).
>>>
>>>             So looks like Android does not follow the (albeit
>>>             optional) advice in the Java language specification
>>>             about this.
>>>
>>>             *JDK: DOES NOT split writes to longs into two parts
>>>             (even 32-bit implementations)*
>>>
>>>             Of course we couldn't get this to happen on any 64-bit
>>>             JVM, but we tried it out under Linux on 32-bit OpenJDK
>>>             1.7.0_21 it does NOT happen. The 32-bit JVM
>>>             implementations follow the recommendation of the Java
>>>             language specification.
>>>
>>>             An interesting curio. I wonder how many crashes in
>>>             "working" Java code moved from desktop Java onto Android
>>>             programmers are going to lose sleep tracking down this one.
>>>
>>>
>>
>>             Last time I tried this sort of test, a split write would
>>             be observed in under a second on a true dual processor.
>>             However, with only one processor available, it would
>>             typically take around 20 minutes. So you might have to
>>             run a very long test to have any real confidence in the
>>             lack of splitting.
>>
>>             Mark Thornton
>>
>>
>>             _______________________________________________
>>             Concurrency-interest mailing list
>>             Concurrency-interest at cs.oswego.edu
>>             <mailto:Concurrency-interest at cs.oswego.edu>
>>             http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>
>>
>>
>>         _______________________________________________
>>         Concurrency-interest mailing list
>>         Concurrency-interest at cs.oswego.edu  <mailto:Concurrency-interest at cs.oswego.edu>
>>         http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>         _______________________________________________
>         Concurrency-interest mailing list
>         Concurrency-interest at cs.oswego.edu
>         <mailto:Concurrency-interest at cs.oswego.edu>
>         http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>     _______________________________________________
>     Concurrency-interest mailing list
>     Concurrency-interest at cs.oswego.edu
>     <mailto:Concurrency-interest at cs.oswego.edu>
>     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130430/0c4b1817/attachment-0001.html>


More information about the Concurrency-interest mailing list