[concurrency-interest] Numerical Stream code

Nathan Reynolds nathan.reynolds at oracle.com
Thu Feb 14 12:25:56 EST 2013


 > Is it really important to avoid writing to shared arrays from 
multiple threads (of course without synchronization, not even volatile 
writes/reads) when indexes are not shared (each thread writes/reads it's 
own disjunct subset)?

I realize that this is a Java message board, but I have most of my 
knowledge on Intel x86 processors.  Keep that in mind when reading my 
response.  Other processors might behave differently.

Since the threads are reading/writing to their own disjunct subset, the 
data will never be corrupted.  No fences or locks are required.  Right 
before a core writes, the cache line is invalidated in all other cores 
and the update happens in the only copy of the cache line.  If another 
core needs to read or write to that cache line, it will have to get a 
copy from the core which has it.  This is because the Intel x86 
processor is a cache coherent architecture.

This is what is called false sharing.  Threads are making updates to the 
same cache line but at different locations.  Depending upon the 
concurrency hitting each cache line, cache misses are going to be high.  
The cores are going to stall waiting to get a hold of the cache line.  
The inter-core and inter-processor network is going to be flooded or 
even bottlenecked with cache invalidation and cache line fetching 
operations.

Here's the parallelism gotcha.  In a worse case scenario, every thread 
is updating its own byte in a single cache line in a very tight loop.  
You have now just serialized all of the cores on 1 cache line.  You will 
get much worse performance than if a single thread did all of the work.  
This is because with a single thread, the cache line will stay in L1 
cache.  The memory accesses will be extremely fast (i.e. 4 cycles).  
With multiple threads, once the core has the cache line in L1, the 
memory access will still be 4 cycles; however, the cache line must spend 
a lot of its time inaccessible as it travels among the cores.

 > Do element sizes matter (byte vs. short vs. int  vs. long)?

I don't think so.  All of this assumes that the proper instruction is 
used.  For example, if 2 threads are writing to adjacent bytes, then the 
"mov" instruction has to only write the byte.  If the compiler, decides 
to read 32-bits, mask in the 8-bits and write 32-bits then the data will 
be corrupted.  I believe that HotSpot will only generate the write byte 
mov instruction.

Nathan Reynolds 
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> | 
Architect | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
On 2/14/2013 8:56 AM, Peter Levart wrote:
> On 02/14/2013 03:45 PM, Brian Goetz wrote:
>>> The parallel version is almost certainly suffering false cache line
>>> sharing when adjacent tasks are writing to the shared arrays u0, etc.
>>> Nothing to do with streams, just a standard parallelism gotcha.
>> Cure: don't write to shared arrays from parallel tasks.
>>
>>
> Hi,
>
> I would like to discuss this a little bit (hence the cc: 
> concurrency-interest - the conversation can continue on this list only).
>
> Is it really important to avoid writing to shared arrays from multiple 
> threads (of course without synchronization, not even volatile 
> writes/reads) when indexes are not shared (each thread writes/reads 
> it's own disjunct subset).
>
> Do element sizes matter (byte vs. short vs. int  vs. long)?
>
> I had a (false?) feeling that cache lines are not invalidated when 
> writes are performed without fences.
>
> Also I don't know how short (byte, char) writes are combined into 
> memory words on the hardware when they come from different cores and 
> whether this is connected to any performance issues.
>
> Thanks,
>
> Peter
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130214/1789034c/attachment.html>


More information about the Concurrency-interest mailing list