[concurrency-interest] Blocking vs. non-blocking

Vitaly Davidovich vitalyd at gmail.com
Fri Jun 13 22:09:32 EDT 2014

I think this mostly equates to "unexplained" latency spikes.  It's true
that if you can peg all cores on a multi core machine then there won't be
enough idle cycles for the cpu to start powering down.

In practice, on machines with heterogeneous stuff running it's possible
that things align such that some core will power down and whenever
something gets scheduled to it, there will be a longer delay than normal.

I think for the purpose of your article this is maybe a nice little
footprint but is not a headline - my 2 cents.

Sent from my phone
On Jun 13, 2014 9:50 PM, "Dennis Sosnoski" <dms at sosnoski.com> wrote:

>  On 06/14/2014 01:31 PM, Vitaly Davidovich wrote:
> I'd think the 1M cycle delays to get a thread running again are probably
> due to OS scheduling it on a cpu that is in a deep c-state; there can be
> significant delays as the cpu powers back on.
> That makes sense, but I'd think it would only be an issue for systems
> under light load.
>   - Dennis
>  Sent from my phone
> On Jun 13, 2014 9:07 PM, "Dennis Sosnoski" <dms at sosnoski.com> wrote:
>> On 06/14/2014 11:57 AM, Doug Lea wrote:
>>> On 06/13/2014 07:35 PM, Dennis Sosnoski wrote:
>>>> I'm writing an article where I'm discussing both blocking waits and
>>>> non-blocking
>>>> callbacks for handling events. As I see it, there are two main reasons
>>>> for
>>>> preferring non-blocking:
>>>> 1. Threads are expensive resources (limited to on the order of 10000
>>>> per JVM),
>>>> and tying one up just waiting for an event completion is a waste of
>>>> this resource
>>>> 2. Thread switching adds substantial overhead to the application
>>>> Are there any other good reasons I'm missing?
>>> Also memory locality (core X cache effects).
>> I thought about that, though couldn't come up with any easy way of
>> demonstrating the effect. I suppose something more memory-intensive would
>> do this - perhaps having a fairly sizable array of values for each thread,
>> and having the thread do some computation with those values each time it's
>> run.
>>>> ...
>>>> So a big drop in performance going from one thread to two, and again
>>>> from 2 to
>>>> 4, but after than just a slowly increasing trend. That's about 19
>>>> microseconds
>>>> per switch with 4096 threads, about half that time for just 2 threads.
>>>> Do these
>>>> results make sense to others?
>>> Your best case of approximately 20 thousand clock cycles is not an
>>> unexpected result on a single-socket multicore with all cores turned
>>> on (i.e., no power management, fusing, or clock-step effects)
>>> and only a few bouncing cachelines.
>>> We've seen cases of over 1 million cycles to unblock a thread
>>> in some other cases. (Which can be challenging for us to deal
>>> with in JDK8 Stream.parallel(). I'll post something on this sometime.)
>>> Maybe Aleksey can someday arrange to collect believable
>>> systematic measurements across a few platforms.
>> The reason for the long delay being cache effects, right? I'll try some
>> experiments with associated data per thread to see if I can demonstrate
>> this on a small scale.
>> Thanks for the insights, Doug.
>>   - Dennis
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20140613/0a7a71cc/attachment.html>

More information about the Concurrency-interest mailing list