[concurrency-interest] Blocking vs. non-blocking

Vitaly Davidovich vitalyd at gmail.com
Fri Jun 13 21:31:28 EDT 2014


I'd think the 1M cycle delays to get a thread running again are probably
due to OS scheduling it on a cpu that is in a deep c-state; there can be
significant delays as the cpu powers back on.

Sent from my phone
On Jun 13, 2014 9:07 PM, "Dennis Sosnoski" <dms at sosnoski.com> wrote:

> On 06/14/2014 11:57 AM, Doug Lea wrote:
>
>> On 06/13/2014 07:35 PM, Dennis Sosnoski wrote:
>>
>>> I'm writing an article where I'm discussing both blocking waits and
>>> non-blocking
>>> callbacks for handling events. As I see it, there are two main reasons
>>> for
>>> preferring non-blocking:
>>>
>>> 1. Threads are expensive resources (limited to on the order of 10000 per
>>> JVM),
>>> and tying one up just waiting for an event completion is a waste of this
>>> resource
>>> 2. Thread switching adds substantial overhead to the application
>>>
>>> Are there any other good reasons I'm missing?
>>>
>>
>> Also memory locality (core X cache effects).
>>
>
> I thought about that, though couldn't come up with any easy way of
> demonstrating the effect. I suppose something more memory-intensive would
> do this - perhaps having a fairly sizable array of values for each thread,
> and having the thread do some computation with those values each time it's
> run.
>
>
>>
>>> ...
>>> So a big drop in performance going from one thread to two, and again
>>> from 2 to
>>> 4, but after than just a slowly increasing trend. That's about 19
>>> microseconds
>>> per switch with 4096 threads, about half that time for just 2 threads.
>>> Do these
>>> results make sense to others?
>>>
>>
>> Your best case of approximately 20 thousand clock cycles is not an
>> unexpected result on a single-socket multicore with all cores turned
>> on (i.e., no power management, fusing, or clock-step effects)
>> and only a few bouncing cachelines.
>>
>> We've seen cases of over 1 million cycles to unblock a thread
>> in some other cases. (Which can be challenging for us to deal
>> with in JDK8 Stream.parallel(). I'll post something on this sometime.)
>> Maybe Aleksey can someday arrange to collect believable
>> systematic measurements across a few platforms.
>>
>
> The reason for the long delay being cache effects, right? I'll try some
> experiments with associated data per thread to see if I can demonstrate
> this on a small scale.
>
> Thanks for the insights, Doug.
>
>   - Dennis
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20140613/fbbf7ab6/attachment.html>


More information about the Concurrency-interest mailing list