[concurrency-interest] Blocking vs. non-blocking

Vitaly Davidovich vitalyd at gmail.com
Wed Aug 6 15:12:37 EDT 2014

"bytes read so far" isn't really computation -- that part would be handled
by the i/o thread.  It's signalled that a channel is readable, proceeds to
read whatever's in there, and if that doesn't form a full/processable
request, it puts it aside in a buffer.  Yes, you need to keep some state
around here, but I don't see this particular case as a big problem.  This
also gives the server a chance to do graceful slow client (or deliberately
malicious) handling by determining how much or how long data can be
buffered before a full request is received.

If you were going to do a thread-per-request model, what would happen if
you have 100k+ connections? I'm not aware of any OS that can handle that
many threads (even if only a small fraction of them is runnable, that's
going to be a fairly chunky absolute number).  Memory footprint will be an
issue as well, especially if the stack sizes get big.

Finally, as I mentioned earlier, for cases like HTTP servers it doesn't
make sense to have a thread-per-connection if you're expecting lots and
lots of concurrent connections.  There's a limit on the i/o bandwidth and
cpu consumption, so may as well manage that part explicitly (e.g. 1 i/o
thread servicing all file descriptors, and then ~N worker threads where N
is # of cpus).  But to reiterate, if you're handling say a few thousand
concurrent (but mostly idle at a time) connections, you can probably get
away with using a thread per request.

On Wed, Aug 6, 2014 at 2:53 PM, Oleksandr Otenko <
oleksandr.otenko at oracle.com> wrote:

>  But it's contagious. It is difficult to combine data push (event-based
> IO) and data pull (blocking IO) in the same program.
> If you have tens of thousands connections, but computation cannot make
> progress without reading complete request, you still need to either switch
> to blocking IO, or switch to capturing the state of the computation (even
> if it's only the "bytes read so far") in some way. I don't see what we are
> meant to save by not using threads, which capture some of the state on the
> stack, and which offers language support (try/finally, synchronized, error
> propagation, etc).
> Alex
> On 06/08/2014 18:14, Vitaly Davidovich wrote:
> I don't think the point of NIO (or event-based i/o in general) is to have
> better absolute latency/throughput in all cases than blocking i/o.
>  Instead, it's really intended for being able to scale to tens (and
> hundreds) of thousands of concurrent (and mostly idle at a point in time)
> connections on a single server.  This makes intuitive sense since pretty
> much just one core dedicated to doing i/o can saturate a NIC (given
> sufficient read/write workload); the rest of the compute resources can be
> dedicated to the CPU bound workload.  Creating a thread-per-connection in
> those circumstances either doesn't make sense or simply won't work at all.
> On Wed, Aug 6, 2014 at 12:16 PM, DT <dt at flyingtroika.com> wrote:
>> We have done multiple experiments in respect to nio and io java APIs and
>> we have not seen that much improvement in throughput or latencies with NIO.
>> Got almost the same stats for udp , tcp and http based packets (running
>> on windows and linux platforms). Though we noticed that the more traffic we
>> handle the better results we got with NIO implementation in terms of
>> latencies and overall throughput of the application (there is some sort of
>> threshold when system starts reacting better). The idea was to move to java
>> NIO APIs but due to the results we decided to make some more research. Its
>> difficult to make a benchmark just because even a small change in linux
>> kernel/nic can lead to different results. When we converted java logic into
>> C++/C code and used linux non blocking/event based calls we have got much
>> better optimization/performance. Good example is to compare nginx socket
>> event module and Java NIO APIs.  Probably we shoud not compare java
>> non-blocking calls to c/c++ calls and implementation though I thought its a
>> good idea to get a benchmark this way.
>> Thanks,
>> DT
>> On 8/5/2014 4:51 PM, Zhong Yu wrote:
>>> On Tue, Aug 5, 2014 at 6:41 PM, Stanimir Simeonoff <stanimir at riflexo.com>
>>> wrote:
>>>>  There's a dilemma though - if the application code is writing bytes to
>>>>> the response body with blocking write(), isn't it tying up a thread if
>>>>> the client is slow? And if the write() is non-blocking, aren't we
>>>>> buffering up too much data? I think this problem can often be solved
>>>>> by a non-blocking write(obj) that buffers `obj` with lazy
>>>>> serialization, see "optimization technique" in
>>>>> http://bayou.io/draft/response.style.html#Response_body
>>>>> Zhong Yu
>>>> The lazy serialization unfortunately requires the object to be fully
>>>> fetched
>>>> (not depending on any context or an active database connection) which
>>>> is not
>>>> that different than "buffering too much" - it's just not plain
>>>> ByteBuffer
>>> There's a difference if the objects are shared among responses which
>>> is a reasonable assumption for a lot of web applications.
>>>  (or byte[]).
>>>> Personally I don't like lazy serialization as that leaves objects in the
>>>> queues and the latter may have implications of module (classes)
>>>> redeploys
>>>> with slow clients. Also it makes a lot hard quantifying the expected
>>>> queue
>>>> length per connection and shutting down slow connection.
>>>> Stanimir
>>>>>  Alex
>>>>>> On 03/08/2014 20:06, Zhong Yu wrote:
>>>>>>>  Also, apparently, in heavy I/O scenarios, you may have a much better
>>>>>>>> system
>>>>>>>> throughput waiting for things to happen in I/O (blocking I/O) vs
>>>>>>>> being
>>>>>>>> notified of I/O events (Selector-based I/O):
>>>>>>>> http://www.mailinator.com/tymaPaulMultithreaded.pdf. Paper is 6
>>>>>>>> years
>>>>>>>> old
>>>>>>>> and kernel/Java realities might have changed, YMMV, but the
>>>>>>>> difference
>>>>>>>> is(was?) impressive. Also, Apache HTTP Client still swears by
>>>>>>>> blocking
>>>>>>>> I/O
>>>>>>>> vs non-blocking one in terms of efficiency:
>>>>>>>> http://wiki.apache.org/HttpComponents/HttpClient3vsHttpClient4vsHttpCore
>>>>>>> To add a small data point to this discussion, Tomcat with NIO is
>>>>>>> apparently slower than Tomcat with Blocking-IO by 1,700ns for a
>>>>>>> simple
>>>>>>> request-response, according to a benchmark I did recently [1]. But!
>>>>>>> The difference is very small, and I would argue that it is
>>>>>>> negligible.
>>>>>>> Paul Tyma's claim (that the throughput of Blocking-IO is 30% more
>>>>>>> than
>>>>>>> NIO) is not very meaningful for real applications. I did once
>>>>>>> replicate his claim with a test that does nothing with the bytes
>>>>>>> being
>>>>>>> transferred; but as soon as you at least read each byte once, the
>>>>>>> throughput difference becomes very unimpressive (and frankly I
>>>>>>> suspect
>>>>>>> it's largely due to Java's implementation of NIO).
>>>>>>> [1] http://bayou.io/draft/Comparing_Java_HTTP_Servers_Latencies.html
>>>>>>> Zhong Yu
>>>>>>> bayou.io
