[concurrency-interest] Blocking vs. non-blocking

Dennis Sosnoski dms at sosnoski.com
Thu Jun 26 08:08:45 EDT 2014

On 06/26/2014 01:35 PM, Arcadiy Ivanov wrote:
> Based on what I've read in the benchmark code supplied, I have a 
> sneaking suspicion that the problem is the benchmark itself, since I'm 
> not sure it's measuring "thread switching performance". :)
> I would *highly* recommend talking to Aleksey about using JMH for all 
> your benchmarking needs and would, personally, dramatically simplify 
> the benchmark by using exclusively LockSupport.park/unpark if your 
> intention is to measure thread parking/unparking time.

I'm really looking to get a handle on the overhead involved in blocking 
thread switches. To make this more representative of actual blocking 
operations of the type I'm interested in I've switched the code at 
over to using CompletableFutures for switching between threads, with 
each thread completing the future the next thread is waiting on when the 
first thread is done. That did lower the overhead significantly, so 
clearly CompletableFuture is using more efficient handling than the 
crude old synchronized block approach. If you're interested, you can see 
a comparison between the CompletableFuture and synchronized versions at 
http://sosnoski.com/threads-linux-notebook.png I didn't bother rerunning 
the timing on Windows.

I considered JMH to be overkill for this case, where I want some simple 
standalone code, but I'll definitely give it a try in the future.

> Be aware, that System.nanoTime (I would not recommend measuring 
> anything faster than your average human heartbeat with 
> System.currentTimeMillis() - resolution is 1ms at best) performs 
> weirdly under contention 
> (http://shipilev.net/blog/2014/nanotrusting-nanotime/). I.e. you may 
> not be able to measure what you want to measure using Java benchmark 
> to begin with and may need lower level native OS/CPU metrics tools.

In this case the time measurements are in seconds, so 
currentTimeMillis() resolution or accuracy is really not an issue - but 
I changed the code to use nanoTime() and ran a comparison just to make sure.


   - Dennis

> Also, I haven't looked at or, frankly, used synchronized in quite a 
> while, but I think it results, assuming lock elision doesn't kick in, 
> in libc mutexes of some sort (pthread? libthread? win32 mutex?), i.e. 
> you may be measuring lock performance in a libc on the specific OSes, 
> not thread switching performance.
> Hope this helps,
> - Arcadiy
> On 2014-06-25 21:05, Dennis Sosnoski wrote:
>> On 06/14/2014 02:32 PM, Arcadiy Ivanov wrote:
>>> If memory serves me right, Mr Shipilev mentioned in one of his 
>>> presentations in Oracle Spb DC re FJP optimization challenges (in 
>>> Russian, sorry, https://www.youtube.com/watch?v=t0dGLFtRR9c#t=3096) 
>>> that thread scheduling overhead of "sane OSes" (aka Linux) is approx 
>>> 50 us on average, while 'certain not-quite-sane OS named starting 
>>> with "W"' is much more than that.
>>> Loaded Linux kernel can produce latencies in *tens of seconds* 
>>> (http://www.versalogic.com/downloads/whitepapers/real-time_linux_benchmark.pdf, 
>>> page 13) without RT patches, and tens of us with RT ones. YMMV 
>>> dramatically depending on kernel, kernel version, scheduler, 
>>> architecture and load.
>> I actually found that Windows 7 did much better at thread switching 
>> performance than my Linux system with same-era kernel when running on 
>> my laptop system (Windows 7 Home Premium, Linux 3.4.63,Toshiba 
>> Satellite P750D with AMD A8-3520M APU). You can see my timing results 
>> here: http://www.sosnoski.com/thread-linux-windows.png The data block 
>> size relates to a block of per-thread data run though on every thread 
>> switch to show caching effects. Threads are executed in strict 
>> rotation, each notifying the next to run. The actual code is at: 
>> https://github.com/dsosnoski/concur3/blob/master/src/com/sosnoski/concur/article3/ThreadSwitch.java
>> So now I'm wondering if recent Windows versions actually have lower 
>> thread switching overhead in general, or if there are perhaps some 
>> OS-specific optimizations for the particular hardware (the Windows 
>> installation came with the laptop; I added Linux myself, generic 
>> OpenSUSE without any optimizations). Anyone have any thoughts on this?
>> Thanks,
>>   - Dennis

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20140627/45dae2d9/attachment.html>

More information about the Concurrency-interest mailing list