[concurrency-interest] Blocking vs. non-blocking

Dennis Sosnoski dms at sosnoski.com
Sat Jun 14 00:31:21 EDT 2014


On 06/14/2014 02:32 PM, Arcadiy Ivanov wrote:
> If memory serves me right, Mr Shipilev mentioned in one of his 
> presentations in Oracle Spb DC re FJP optimization challenges (in 
> Russian, sorry, https://www.youtube.com/watch?v=t0dGLFtRR9c#t=3096) 
> that thread scheduling overhead of "sane OSes" (aka Linux) is approx 
> 50 us on average, while 'certain not-quite-sane OS named starting with 
> "W"' is much more than that.
> Loaded Linux kernel can produce latencies in *tens of seconds* 
> (http://www.versalogic.com/downloads/whitepapers/real-time_linux_benchmark.pdf, 
> page 13) without RT patches, and tens of us with RT ones. YMMV 
> dramatically depending on kernel, kernel version, scheduler, 
> architecture and load.

Sounds scary, glad my kernel seems reasonable even without RT (stock 
OpenSUSE 12.3). I have a recentish W... OS installation on a laptop 
drive (I pulled it and replaced it with an SSD, but keep the original 
around for when I need to restore my keyboard backlight - you don't want 
to know). Maybe I'll give that a try to see how it compares to the Linux 
performance on the same system, just for fun.

>
> That said, uncontended AbstractQueuedSynchronizer and everything based 
> on it (ReentrantLock, Semaphore, CountDownLatch etc) is a single 
> succeeding CAS (in best case scenario it could even be a cached 
> volatile read such as in 0-count CountDownLatch), i.e. *relatively* 
> inexpensive.

I'm actually using direct wait()/notify() rather than a more 
sophisticated way of executing threads in turn, since I'm mostly 
interested in showing people why they should use callback-type event 
handling vs. blocking waits.

>
> When talking about blocking vs non-blocking I would also take a close 
> look at Quasar (https://github.com/puniverse/quasar) when discussing a 
> scenario where one thread suspends after submitting a single task to 
> pool and awaiting result of that task executing in the pool on, 
> supposedly, other thread. Quasar implements continuations of sorts and 
> resolves a problem of thread park/unpark in that quite narrow case 
> while maintaining code Thread semantics (i.e. Fiber vs Thread) by 
> executing the scheduled task on the same thread and avoiding 
> park+wait/unpark.

Yes, I'd noted Quasar from an earlier discussion on the list. It looks 
like it would make a good topic for a future article in the series. :-)

   - Dennis

>
> On 2014-06-13 21:50, Dennis Sosnoski wrote:
>> On 06/14/2014 01:31 PM, Vitaly Davidovich wrote:
>>>
>>> I'd think the 1M cycle delays to get a thread running again are 
>>> probably due to OS scheduling it on a cpu that is in a deep c-state; 
>>> there can be significant delays as the cpu powers back on.
>>>
>>
>> That makes sense, but I'd think it would only be an issue for systems 
>> under light load.
>>
>>   - Dennis
>>
>>> Sent from my phone
>>>
>>> On Jun 13, 2014 9:07 PM, "Dennis Sosnoski" <dms at sosnoski.com 
>>> <mailto:dms at sosnoski.com>> wrote:
>>>
>>>     On 06/14/2014 11:57 AM, Doug Lea wrote:
>>>
>>>         On 06/13/2014 07:35 PM, Dennis Sosnoski wrote:
>>>
>>>             I'm writing an article where I'm discussing both
>>>             blocking waits and non-blocking
>>>             callbacks for handling events. As I see it, there are
>>>             two main reasons for
>>>             preferring non-blocking:
>>>
>>>             1. Threads are expensive resources (limited to on the
>>>             order of 10000 per JVM),
>>>             and tying one up just waiting for an event completion is
>>>             a waste of this resource
>>>             2. Thread switching adds substantial overhead to the
>>>             application
>>>
>>>             Are there any other good reasons I'm missing?
>>>
>>>
>>>         Also memory locality (core X cache effects).
>>>
>>>
>>>     I thought about that, though couldn't come up with any easy way
>>>     of demonstrating the effect. I suppose something more
>>>     memory-intensive would do this - perhaps having a fairly sizable
>>>     array of values for each thread, and having the thread do some
>>>     computation with those values each time it's run.
>>>
>>>
>>>
>>>             ...
>>>             So a big drop in performance going from one thread to
>>>             two, and again from 2 to
>>>             4, but after than just a slowly increasing trend. That's
>>>             about 19 microseconds
>>>             per switch with 4096 threads, about half that time for
>>>             just 2 threads. Do these
>>>             results make sense to others?
>>>
>>>
>>>         Your best case of approximately 20 thousand clock cycles is
>>>         not an
>>>         unexpected result on a single-socket multicore with all
>>>         cores turned
>>>         on (i.e., no power management, fusing, or clock-step effects)
>>>         and only a few bouncing cachelines.
>>>
>>>         We've seen cases of over 1 million cycles to unblock a thread
>>>         in some other cases. (Which can be challenging for us to deal
>>>         with in JDK8 Stream.parallel(). I'll post something on this
>>>         sometime.)
>>>         Maybe Aleksey can someday arrange to collect believable
>>>         systematic measurements across a few platforms.
>>>
>>>
>>>     The reason for the long delay being cache effects, right? I'll
>>>     try some experiments with associated data per thread to see if I
>>>     can demonstrate this on a small scale.
>>>
>>>     Thanks for the insights, Doug.
>>>
>>>       - Dennis
>>>
>>>     _______________________________________________
>>>     Concurrency-interest mailing list
>>>     Concurrency-interest at cs.oswego.edu
>>>     <mailto:Concurrency-interest at cs.oswego.edu>
>>>     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>
>>
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20140614/6f807af2/attachment-0001.html>


More information about the Concurrency-interest mailing list