[concurrency-interest] Blocking vs. non-blocking

Dennis Sosnoski dms at sosnoski.com
Fri Jun 13 21:50:50 EDT 2014


On 06/14/2014 01:31 PM, Vitaly Davidovich wrote:
>
> I'd think the 1M cycle delays to get a thread running again are 
> probably due to OS scheduling it on a cpu that is in a deep c-state; 
> there can be significant delays as the cpu powers back on.
>

That makes sense, but I'd think it would only be an issue for systems 
under light load.

   - Dennis

> Sent from my phone
>
> On Jun 13, 2014 9:07 PM, "Dennis Sosnoski" <dms at sosnoski.com 
> <mailto:dms at sosnoski.com>> wrote:
>
>     On 06/14/2014 11:57 AM, Doug Lea wrote:
>
>         On 06/13/2014 07:35 PM, Dennis Sosnoski wrote:
>
>             I'm writing an article where I'm discussing both blocking
>             waits and non-blocking
>             callbacks for handling events. As I see it, there are two
>             main reasons for
>             preferring non-blocking:
>
>             1. Threads are expensive resources (limited to on the
>             order of 10000 per JVM),
>             and tying one up just waiting for an event completion is a
>             waste of this resource
>             2. Thread switching adds substantial overhead to the
>             application
>
>             Are there any other good reasons I'm missing?
>
>
>         Also memory locality (core X cache effects).
>
>
>     I thought about that, though couldn't come up with any easy way of
>     demonstrating the effect. I suppose something more
>     memory-intensive would do this - perhaps having a fairly sizable
>     array of values for each thread, and having the thread do some
>     computation with those values each time it's run.
>
>
>
>             ...
>             So a big drop in performance going from one thread to two,
>             and again from 2 to
>             4, but after than just a slowly increasing trend. That's
>             about 19 microseconds
>             per switch with 4096 threads, about half that time for
>             just 2 threads. Do these
>             results make sense to others?
>
>
>         Your best case of approximately 20 thousand clock cycles is not an
>         unexpected result on a single-socket multicore with all cores
>         turned
>         on (i.e., no power management, fusing, or clock-step effects)
>         and only a few bouncing cachelines.
>
>         We've seen cases of over 1 million cycles to unblock a thread
>         in some other cases. (Which can be challenging for us to deal
>         with in JDK8 Stream.parallel(). I'll post something on this
>         sometime.)
>         Maybe Aleksey can someday arrange to collect believable
>         systematic measurements across a few platforms.
>
>
>     The reason for the long delay being cache effects, right? I'll try
>     some experiments with associated data per thread to see if I can
>     demonstrate this on a small scale.
>
>     Thanks for the insights, Doug.
>
>       - Dennis
>
>     _______________________________________________
>     Concurrency-interest mailing list
>     Concurrency-interest at cs.oswego.edu
>     <mailto:Concurrency-interest at cs.oswego.edu>
>     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20140614/2113d6e7/attachment-0001.html>


More information about the Concurrency-interest mailing list