[concurrency-interest] Numerical Stream code

Nitsan Wakart nitsanw at yahoo.com
Thu Feb 14 12:09:01 EST 2013


There's no issue with writing to disjoint areas in same arrays, you'll need some correct publication mechanism to make sure all the updates happened before reads do if you are worried about that.
Writing in larger chunks is faster, so writing longs directly is better than writing individual bytes. If you are using HeapByteBuffer you should know that the writes will translate to byte writes. On a DirectByteBuffer each type write ends up as an Unsafe.putType, so you don't get that effect. Also with DirectByteBuffers you can control alignment(see this blog post: http://psy-lob-saw.blogspot.com/2013/01/direct-memory-alignment-in-java.html and the follow up here:http://psy-lob-saw.blogspot.com/2013/02/alignment-concurrency-and-torture-x86.html) and make better choices with regards to splitting chunks between threads such that no false sharing ever happens.


________________________________
 From: Peter Levart <peter.levart at gmail.com>
To: lambda-dev at openjdk.java.net 
Cc: concurrency-interest at cs.oswego.edu 
Sent: Thursday, February 14, 2013 3:56 PM
Subject: Re: [concurrency-interest] Numerical Stream code
 
On 02/14/2013 03:45 PM, Brian Goetz wrote:
>> The parallel version is almost certainly suffering false cache line
>> sharing when adjacent tasks are writing to the shared arrays u0, etc.
>> Nothing to do with streams, just a standard parallelism gotcha.
> Cure: don't write to shared arrays from parallel tasks.
> 
> 
Hi,

I would like to discuss this a little bit (hence the cc: concurrency-interest - the conversation can continue on this list only).

Is it really important to avoid writing to shared arrays from multiple threads (of course without synchronization, not even volatile writes/reads) when indexes are not shared (each thread writes/reads it's own disjunct subset).

Do element sizes matter (byte vs. short vs. int  vs. long)?

I had a (false?) feeling that cache lines are not invalidated when writes are performed without fences.

Also I don't know how short (byte, char) writes are combined into memory words on the hardware when they come from different cores and whether this is connected to any performance issues.

Thanks,

Peter

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130214/aac8de3b/attachment.html>


More information about the Concurrency-interest mailing list