[concurrency-interest] Blocking vs. non-blocking

DT dt at flyingtroika.com
Tue Aug 12 02:22:21 EDT 2014


Our perspective is to be able to distribute connections between multiple 
nodes which really boils down to the load balancer (more or less) - 
hardware based or software based load balancer is a separate discussion. 
I can tell that one single linux server can handle >~ 60 k concurrent 
connections (depends on the actual server, number of NICs, linux kernel 
has different scheduling algorithms that can be configured for 
processes/threads/io as well, experiments with scheduling is a very 
interesting work and can lead to different results).
I said concurrent but the real picture is that multiple 
ThreadPoolExecutors handle this load. 'concurrency' is bounded by the 
queue size (milliseconds or nano seconds bounded concurrency is also a 
good question...). Plus on top of that distributed caching handles a lot 
of load as well such as data processing, fast data lookups.  Non 
blocking data structure can help a lot in this case.
I will continue experimenting with non blocking NIO sockets though again 
so far it helps more to handle big data streams than small packets in 
terms of IOPs/CPU/latencies. The server is utilized better by handling 
big packets using NIO rather than using the threaded approach. So the 
size of the packets can make a difference in scaling. By the way having 
a good description how NIO is implemented on low level  would help for sure.

Practically it does not matter what number of connections a single 
server can handle at any given time because it is always going to be a 
bigger demand and number should increase. Of course we would like to 
utilize hardware as much as possible. The distribution 
mechanism/algorithm of socket connections by application caries a more 
important fundamental role in the way how application can be scaled. So 
thats why Non blocking approach is that much of our interest + 
synchronization between nodes + scheduling.

Thanks,
DT


On 8/6/2014 12:39 PM, Oleksandr Otenko wrote:
> I understand what you are saying. They don't look like performance 
> decisions and they rather smell like a work around the inability to 
> support 100+k stacks than a convenience.
>
>
> Alex
>
>
> On 06/08/2014 20:12, Vitaly Davidovich wrote:
>> "bytes read so far" isn't really computation -- that part would be 
>> handled by the i/o thread.  It's signalled that a channel is 
>> readable, proceeds to read whatever's in there, and if that doesn't 
>> form a full/processable request, it puts it aside in a buffer.  Yes, 
>> you need to keep some state around here, but I don't see this 
>> particular case as a big problem.  This also gives the server a 
>> chance to do graceful slow client (or deliberately malicious) 
>> handling by determining how much or how long data can be buffered 
>> before a full request is received.
>>
>> If you were going to do a thread-per-request model, what would happen 
>> if you have 100k+ connections? I'm not aware of any OS that can 
>> handle that many threads (even if only a small fraction of them is 
>> runnable, that's going to be a fairly chunky absolute number). 
>>  Memory footprint will be an issue as well, especially if the stack 
>> sizes get big.
>>
>> Finally, as I mentioned earlier, for cases like HTTP servers it 
>> doesn't make sense to have a thread-per-connection if you're 
>> expecting lots and lots of concurrent connections.  There's a limit 
>> on the i/o bandwidth and cpu consumption, so may as well manage that 
>> part explicitly (e.g. 1 i/o thread servicing all file descriptors, 
>> and then ~N worker threads where N is # of cpus).  But to reiterate, 
>> if you're handling say a few thousand concurrent (but mostly idle at 
>> a time) connections, you can probably get away with using a thread 
>> per request.
>>
>>
>> On Wed, Aug 6, 2014 at 2:53 PM, Oleksandr Otenko 
>> <oleksandr.otenko at oracle.com <mailto:oleksandr.otenko at oracle.com>> wrote:
>>
>>     But it's contagious. It is difficult to combine data push
>>     (event-based IO) and data pull (blocking IO) in the same program.
>>
>>     If you have tens of thousands connections, but computation cannot
>>     make progress without reading complete request, you still need to
>>     either switch to blocking IO, or switch to capturing the state of
>>     the computation (even if it's only the "bytes read so far") in
>>     some way. I don't see what we are meant to save by not using
>>     threads, which capture some of the state on the stack, and which
>>     offers language support (try/finally, synchronized, error
>>     propagation, etc).
>>
>>
>>     Alex
>>
>>
>>
>>     On 06/08/2014 18:14, Vitaly Davidovich wrote:
>>>     I don't think the point of NIO (or event-based i/o in general)
>>>     is to have better absolute latency/throughput in all cases than
>>>     blocking i/o.  Instead, it's really intended for being able to
>>>     scale to tens (and hundreds) of thousands of concurrent (and
>>>     mostly idle at a point in time) connections on a single server.
>>>      This makes intuitive sense since pretty much just one core
>>>     dedicated to doing i/o can saturate a NIC (given sufficient
>>>     read/write workload); the rest of the compute resources can be
>>>     dedicated to the CPU bound workload.  Creating a
>>>     thread-per-connection in those circumstances either doesn't make
>>>     sense or simply won't work at all.
>>>
>>>
>>>     On Wed, Aug 6, 2014 at 12:16 PM, DT <dt at flyingtroika.com
>>>     <mailto:dt at flyingtroika.com>> wrote:
>>>
>>>         We have done multiple experiments in respect to nio and io
>>>         java APIs and we have not seen that much improvement in
>>>         throughput or latencies with NIO.
>>>         Got almost the same stats for udp , tcp and http based
>>>         packets (running on windows and linux platforms). Though we
>>>         noticed that the more traffic we handle the better results
>>>         we got with NIO implementation in terms of latencies and
>>>         overall throughput of the application (there is some sort of
>>>         threshold when system starts reacting better). The idea was
>>>         to move to java NIO APIs but due to the results we decided
>>>         to make some more research. Its difficult to make a
>>>         benchmark just because even a small change in linux
>>>         kernel/nic can lead to different results. When we converted
>>>         java logic into C++/C code and used linux non blocking/event
>>>         based calls we have got much better
>>>         optimization/performance. Good example is to compare nginx
>>>         socket event module and Java NIO APIs.  Probably we shoud
>>>         not compare java non-blocking calls to c/c++ calls and
>>>         implementation though I thought its a good idea to get a
>>>         benchmark this way.
>>>
>>>         Thanks,
>>>         DT
>>>
>>>         On 8/5/2014 4:51 PM, Zhong Yu wrote:
>>>
>>>             On Tue, Aug 5, 2014 at 6:41 PM, Stanimir Simeonoff
>>>             <stanimir at riflexo.com <mailto:stanimir at riflexo.com>> wrote:
>>>
>>>
>>>
>>>
>>>
>>>                     There's a dilemma though - if the application
>>>                     code is writing bytes to
>>>                     the response body with blocking write(), isn't
>>>                     it tying up a thread if
>>>                     the client is slow? And if the write() is
>>>                     non-blocking, aren't we
>>>                     buffering up too much data? I think this problem
>>>                     can often be solved
>>>                     by a non-blocking write(obj) that buffers `obj`
>>>                     with lazy
>>>                     serialization, see "optimization technique" in
>>>                     http://bayou.io/draft/response.style.html#Response_body
>>>
>>>                     Zhong Yu
>>>
>>>
>>>                 The lazy serialization unfortunately requires the
>>>                 object to be fully fetched
>>>                 (not depending on any context or an active database
>>>                 connection) which is not
>>>                 that different than "buffering too much" - it's just
>>>                 not plain ByteBuffer
>>>
>>>             There's a difference if the objects are shared among
>>>             responses which
>>>             is a reasonable assumption for a lot of web applications.
>>>
>>>                 (or byte[]).
>>>                 Personally I don't like lazy serialization as that
>>>                 leaves objects in the
>>>                 queues and the latter may have implications of
>>>                 module (classes) redeploys
>>>                 with slow clients. Also it makes a lot hard
>>>                 quantifying the expected queue
>>>                 length per connection and shutting down slow connection.
>>>
>>>                 Stanimir
>>>
>>>
>>>
>>>                         Alex
>>>
>>>
>>>                         On 03/08/2014 20:06, Zhong Yu wrote:
>>>
>>>                                 Also, apparently, in heavy I/O
>>>                                 scenarios, you may have a much better
>>>                                 system
>>>                                 throughput waiting for things to
>>>                                 happen in I/O (blocking I/O) vs being
>>>                                 notified of I/O events
>>>                                 (Selector-based I/O):
>>>                                 http://www.mailinator.com/tymaPaulMultithreaded.pdf.
>>>                                 Paper is 6 years
>>>                                 old
>>>                                 and kernel/Java realities might have
>>>                                 changed, YMMV, but the difference
>>>                                 is(was?) impressive. Also, Apache
>>>                                 HTTP Client still swears by blocking
>>>                                 I/O
>>>                                 vs non-blocking one in terms of
>>>                                 efficiency:
>>>
>>>                                 http://wiki.apache.org/HttpComponents/HttpClient3vsHttpClient4vsHttpCore
>>>
>>>                             To add a small data point to this
>>>                             discussion, Tomcat with NIO is
>>>                             apparently slower than Tomcat with
>>>                             Blocking-IO by 1,700ns for a simple
>>>                             request-response, according to a
>>>                             benchmark I did recently [1]. But!
>>>                             The difference is very small, and I
>>>                             would argue that it is negligible.
>>>
>>>                             Paul Tyma's claim (that the throughput
>>>                             of Blocking-IO is 30% more than
>>>                             NIO) is not very meaningful for real
>>>                             applications. I did once
>>>                             replicate his claim with a test that
>>>                             does nothing with the bytes being
>>>                             transferred; but as soon as you at least
>>>                             read each byte once, the
>>>                             throughput difference becomes very
>>>                             unimpressive (and frankly I suspect
>>>                             it's largely due to Java's
>>>                             implementation of NIO).
>>>
>>>                             [1]
>>>                             http://bayou.io/draft/Comparing_Java_HTTP_Servers_Latencies.html
>>>
>>>                             Zhong Yu
>>>                             bayou.io <http://bayou.io>
>>>                             _______________________________________________
>>>                             Concurrency-interest mailing list
>>>                             Concurrency-interest at cs.oswego.edu
>>>                             <mailto:Concurrency-interest at cs.oswego.edu>
>>>                             http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>>
>>>                     _______________________________________________
>>>                     Concurrency-interest mailing list
>>>                     Concurrency-interest at cs.oswego.edu
>>>                     <mailto:Concurrency-interest at cs.oswego.edu>
>>>                     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>>
>>>             _______________________________________________
>>>             Concurrency-interest mailing list
>>>             Concurrency-interest at cs.oswego.edu
>>>             <mailto:Concurrency-interest at cs.oswego.edu>
>>>             http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>>
>>>         _______________________________________________
>>>         Concurrency-interest mailing list
>>>         Concurrency-interest at cs.oswego.edu
>>>         <mailto:Concurrency-interest at cs.oswego.edu>
>>>         http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>>
>>>
>>>
>>>     _______________________________________________
>>>     Concurrency-interest mailing list
>>>     Concurrency-interest at cs.oswego.edu  <mailto:Concurrency-interest at cs.oswego.edu>
>>>     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20140811/7485e2b6/attachment-0001.html>


More information about the Concurrency-interest mailing list