[concurrency-interest] Benchmark to demonstrate improvement in thread management over the years.

Oleksandr Otenko oleksandr.otenko at oracle.com
Tue Aug 13 06:07:24 EDT 2013

Since they are duals, the event state machine cannot have "a few KB" to 
represent the same thing that a few hundred KB do in threaded case.

Consider the contents of the stack. Why is it so large? That's because 
it holds function-local variables that will be needed upon the return 
from a function call. The calls to the functions are "synchronous" in 
the sense the caller's termination depends on the termination of the 
callee. (disregard "holes" on the stack, where the optimizer doesn't 
care to reuse the space, because the stack is cheap; closures will also 
have "holes" - reserved space for a variable in the closure, which 
doesn't necessarily get filled)

A complete dual of this is a synchronous interaction between agents. The 
agents will keep the same variables that were in function-local 
variables as members of the closure capturing the state to be used by 
the agent receiving the response. It doesn't matter whether you 
represent this closure as the state of the agent, or pass around as part 
of the message: if you need those values after the response to the 
request is computed, you will have to preserve the reference to those 
values somehow.

What does make a difference, is that it is much harder to write 
synchronous interactions using the event state machine than using 
threaded design. So evented design encourages a different approach, a 
different solution, which is not, strictly speaking, */the/* dual. The 
converse is also true. If you translated this evented design into a 
threaded implementation, you'd have thin stacks and a bunch of message 
queues. Someone might even ask why it is called "threaded" in this case.

Also, take into account the allocation pattern difference. Allocating 
something on the stack is dirt-cheap: you just add a number to a 
register. This is very different from allocating a closure from heap.


On 12/08/2013 23:16, Jason Koch wrote:
> Theoretically - they should both have similar behaviours in theory, 
> and the issue is the implementation. This is a very good background 
> paper on the topic - 
> http://www.stanford.edu/class/cs240/readings/vonbehren.pdf - and is 
> well worth reading. In my interpretation, they present threads/events 
> as duals of each other.
> There is no reason we can't have lighter threading models. Erlang/BEAM 
> is known to scale to extremely high concurrent process counts with 
> ease, and Kilim on the JVM looks like a promising way to get 
> lightweight threading (though I'm yet to try it unfortunately). 
> Similarly, you can write a slow evented implementation if you choose to.
> In practice, OS threading tends to have a heavy footprint and 
> challenges context switching, where events can have a very controlled 
> footprint. For example, a full thread stack is usually a few hundred 
> KB, where an event state machine can often be modeled in a handful of 
> bytes up to a few KB.
> So, in practice on Linux, with C or Java or on the MS/.net platform, 
> events and async IO tend to be much more scalable. There are tradeoffs 
> you need to make in your design, but these tradeoffs for certain types 
> of applications are very worthwhile - eg: web servers.
> Thanks
> Jason
> On Tue, Aug 13, 2013 at 1:06 AM, Vitaly Davidovich <vitalyd at gmail.com 
> <mailto:vitalyd at gmail.com>> wrote:
>     Yes, that's a good point.  I think LinkedIn had a presentation on
>     their use of Play, and touched upon this exact scenario (web
>     server having to aggregate/join data from different backend
>     systems).  The other problem with calling out to backend servers
>     using a threaded model (besides memory charge) is that slowness in
>     just one or two of them can ripple throughout entire
>     infrastructure, possibly leading to entire site being down.
>     Sent from my phone
>     On Aug 12, 2013 10:27 AM, "James Roper" <james.roper at typesafe.com
>     <mailto:james.roper at typesafe.com>> wrote:
>         It's also worth pointing out that the thread per request model
>         is becoming less feasible even for simple web apps. Modern
>         service oriented architectures often require that a single web
>         request may make many requests to other backend services. At
>         the extreme, we see users writing Play apps that make hundreds
>         of backend API calls per request. In order to provide
>         acceptable response times, these requests must be made in
>         parallel. With blocking IO, that would mean a single request
>         might take 100 threads, if you had just 100 concurrent
>         requests, that's 10000 threads, if each thread stack takes
>         100kb of real memory, that's 1GB memory just for thread
>         stacks. That's not cheap.
>         Regards,
>         James
>         On Aug 13, 2013 12:08 AM, "Vitaly Davidovich"
>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>             I don't have any benchmarks to give, but I don't think the
>             touted benefits of an evented model includes CPU
>             performance. Rather, using an evented model allows you to
>             scale.  Specific to a web server, you want to be able to
>             handle lots of concurrent connections (most of them are
>             probably idle at any given time) while minimizing resource
>             usage to accomplish that.
>             With a thread-per-request (threaded) model, you may end up
>             using lots of threads but most of them are blocked on i/o
>             at any given time.  A slow client/consumer can tie up a
>             thread for a very long time.  This also makes the server
>             susceptible to a DDoS attack whereby new connections are
>             established, but the clients are purposely slow to tie up
>             the server threads.  Resource usage is also much higher in
>             the threaded model when you have tens of thousands of
>             connections since you're going to pay for stack space for
>             each thread (granted it's VM space, but still).
>             With an evented model, you don't have the inefficiency of
>             having thousands of threads alive but that are
>             blocked/waiting on i/o.  A single thread dedicated to
>             multiplexing i/o across all the connections will probably
>             be sufficient.  The rest is worker threads (most likely =
>             # of CPUs for a dedicated machine) that actually handle
>             the request processing, but don't do any (significant)
>             i/o.  This design also means that you can handle slow
>             clients in a more robust manner.
>             So, the cost of threads can be "heavy" in the case of very
>             busy web servers. The Linux kernel should handle a few
>             thousand threads (most blocked on io) quite well, but I
>             don't think that will be the case for tens or hundreds of
>             thousands.  Even if there's sufficient RAM to handle that
>             many, there may be performance issues coming from the
>             kernel itself, e.g. scheduler.  At the very least, you'll
>             be using resources of the machine inefficiently under that
>             setup.
>             Vitaly
>             Sent from my phone
>             On Aug 12, 2013 9:13 AM, "Unmesh Joshi"
>             <unmeshjoshi at gmail.com <mailto:unmeshjoshi at gmail.com>> wrote:
>                 Hi,
>                 Most of the books on node.js, Akka, Play or any other
>                 event IO based system frequently talk about 'Threads'
>                 being heavy and there is cost we have to pay for all
>                 the booking the OS or the JVM has to do with all the
>                 threads.
>                 While I agree that there must be some cost and for
>                 doing CPU intensive tasks like matrix multiplication,
>                 and fork-join kind of framework will be more
>                 performant, I am not sure if for web server kind of IO
>                 intensive application that's the case.
>                 On the contrary, I am seeing web servers running on
>                 tomcat with 1000 + threads without issues.  For web
>                 servers. I think that Linux level thread management
>                 has improved a lot in last 10 years. Same is with the
>                 JVM.
>                 Do we have any benchmark which shows how much Linux
>                 thread management and JVM thread management have
>                 improved over the years?
>                 Thanks,
>                 Unmesh
>                 _______________________________________________
>                 Concurrency-interest mailing list
>                 Concurrency-interest at cs.oswego.edu
>                 <mailto:Concurrency-interest at cs.oswego.edu>
>                 http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>             _______________________________________________
>             Concurrency-interest mailing list
>             Concurrency-interest at cs.oswego.edu
>             <mailto:Concurrency-interest at cs.oswego.edu>
>             http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>     _______________________________________________
>     Concurrency-interest mailing list
>     Concurrency-interest at cs.oswego.edu
>     <mailto:Concurrency-interest at cs.oswego.edu>
>     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130813/c199cb2e/attachment.html>

More information about the Concurrency-interest mailing list