[concurrency-interest] JLS 17.7 Non-atomic treatment of double and long : Android

oleksandr otenko oleksandr.otenko at oracle.com
Tue Apr 30 13:05:46 EDT 2013


Coprocessor, XMM and others.

Alex

On 30/04/2013 17:37, Vitaly Davidovich wrote:
>
> Curious how x86 would move a long in 1 instruction? There's no memory 
> to memory mov so has to go through register, and thus needs 2 
> registers (and hence split).  Am I missing something?
>
> Sent from my phone
>
> On Apr 30, 2013 12:23 PM, "Nathan Reynolds" 
> <nathan.reynolds at oracle.com <mailto:nathan.reynolds at oracle.com>> wrote:
>
>     You might want to print the assembly using HotSpot (and
>     OpenJDK?).  If the assembly, uses 1 instruction to do the write,
>     then no splitting can ever happen (because alignment takes care of
>     cache line splits).  If the assembly, uses 2 instructions to do
>     the write, then it is only a matter of timing.
>
>     With a single processor system, you are waiting for the thread's
>     quantum to end right after the first instruction but before the
>     second instruction.  This will allow the other thread to see the
>     split write.
>
>     With a dual processor system, the reader thread simply has to get
>     a copy of the cache line after the first write and before the
>     second write.  This is much easier to do.
>
>     HotSpot will do a lot of optimizations on single processor
>     systems.  For example, it gets rid of the "lock" prefix in front
>     of atomic instructions since the instruction's execution can't be
>     split.  It also doesn't output memory fences.  Both of these give
>     good performance boosts.  I wonder if with one processor, OpenJDK
>     is using 2 instructions to do the write whereas with multiple
>     processors it plays it safe and uses 1 instruction.
>
>     Note: If you disable all of the processors but 1 and then start
>     HotSpot, HotSpot will start in single processor mode.  If you then
>     enable those processors while HotSpot is running, a lot of things
>     break and the JVM will crash. Because single processor systems are
>     rare, the default might be changed to assume multiple processors
>     unless the command line specifies 1 processor.
>
>     Nathan Reynolds
>     <http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> |
>     Architect | 602.333.9091 <tel:602.333.9091>
>     Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
>     On 4/30/2013 8:48 AM, Tim Halloran wrote:
>>     Aleksey, correct -- more trials show what you predicted. Thanks
>>     for the nudge.
>>
>>     Mark,
>>
>>     Very helpful, in fact, we are seeing quick failures except for
>>     the dual-processor case -- on a dual processor hardware or VM
>>     (Virtual Box) we have yet to get a failure.  The two programs
>>     attached are what I'm running.  I stripped out my benchmark
>>     framework (so they are easy to run on OpenJDK but not on
>>     Android).  The difference is that one uses two threads (one
>>     writer one reader) the other three (two writers one reader) --
>>     both seem to produce similar results.
>>
>>     With one processor, OpenJDK 1.6.0_27 I see the split write almost
>>     immediatly. Dual we can't get a failure, yet, we get more
>>     failures as the processor count goes up -- but after a few
>>     failures, we don't get any more (they program tries to get 10 to
>>     happen)...we can't get to 10.
>>
>>     It seems that while this can happen on OpenJDK it is rarer than
>>     on Android where ten failures takes less than a second to happen.
>>
>>     Best, Tim
>>
>>
>>
>>     On Tue, Apr 30, 2013 at 11:26 AM, Mark Thornton
>>     <mthornton at optrak.com <mailto:mthornton at optrak.com>> wrote:
>>
>>         On 30/04/13 15:36, Tim Halloran wrote:
>>>         On Mon, Apr 29, 2013 at 4:59 PM, Aleksey Shipilev
>>>         <aleksey.shipilev at oracle.com
>>>         <mailto:aleksey.shipilev at oracle.com>> wrote:
>>>
>>>             Yes, that's exactly what I had in mind:
>>>              a. Declare "long a"
>>>              b. Ramp up two threads.
>>>              c. Make thread 1 write 0L and -1L over and over to field $a
>>>              d. Make thread 2 observe the field a, and count the
>>>             observed values
>>>              e. ...
>>>              f. PROFIT!
>>>
>>>             P.S. It is important to do some action on value read in
>>>             thread 2, so
>>>             that it does not hoisted from the loop, since $a is not
>>>             supposed to be
>>>             volatile.
>>>
>>>             -Aleksey.
>>>
>>>
>>>         This discussion is getting a bit far afield, I guess, but to
>>>         get back onto the topic. I followed Aleksey's advice. And
>>>         wrote an implementation that tests this.  I used two
>>>         separate threads to write 0L and -1L into the long field "a"
>>>         but that is the only real change I made. (I already had some
>>>         scaffolding code to run things on Android or desktop Java).
>>>
>>>         *Android: splits writes to longs into two parts.*
>>>
>>>         On a Samsung Galaxy II with Android 4.0.4  a Nexus 4 phone
>>>         with Android 4.2.2 I saw non-atomic treatment of long. The
>>>         value -4294967296 (xFFFFFFFF00000000) showed up as well as
>>>         4294967295 (x00000000FFFFFFFF).
>>>
>>>         So looks like Android does not follow the (albeit optional)
>>>         advice in the Java language specification about this.
>>>
>>>         *JDK: DOES NOT split writes to longs into two parts (even
>>>         32-bit implementations)*
>>>
>>>         Of course we couldn't get this to happen on any 64-bit JVM,
>>>         but we tried it out under Linux on 32-bit OpenJDK 1.7.0_21
>>>         it does NOT happen. The 32-bit JVM implementations follow
>>>         the recommendation of the Java language specification.
>>>
>>>         An interesting curio. I wonder how many crashes in "working"
>>>         Java code moved from desktop Java onto Android programmers
>>>         are going to lose sleep tracking down this one.
>>>
>>>
>>
>>         Last time I tried this sort of test, a split write would be
>>         observed in under a second on a true dual processor. However,
>>         with only one processor available, it would typically take
>>         around 20 minutes. So you might have to run a very long test
>>         to have any real confidence in the lack of splitting.
>>
>>         Mark Thornton
>>
>>
>>         _______________________________________________
>>         Concurrency-interest mailing list
>>         Concurrency-interest at cs.oswego.edu
>>         <mailto:Concurrency-interest at cs.oswego.edu>
>>         http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>
>>
>>
>>     _______________________________________________
>>     Concurrency-interest mailing list
>>     Concurrency-interest at cs.oswego.edu  <mailto:Concurrency-interest at cs.oswego.edu>
>>     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>     _______________________________________________
>     Concurrency-interest mailing list
>     Concurrency-interest at cs.oswego.edu
>     <mailto:Concurrency-interest at cs.oswego.edu>
>     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130430/7266e600/attachment.html>


More information about the Concurrency-interest mailing list