[concurrency-interest] Upcoming jdk9 j.u.c JEP
gregw at webtide.com
Fri Jul 24 04:18:51 EDT 2015
thanks for engaging in this discussion and I like high contrast colours.
Let me pick up on a number of key phrases in your response:
> "everyone can debug their own failures"
The location that a failure occurs does not imply ownership/blame of that
failure. I can think of lots of examples:
- A publisher calls onNext more than it was allowed to be a request(n)
call. This is a specification compliance problem of the publisher that is
detected in the subscriber. The subscriber does not own this problem and
thus cannot debug it. The publisher needs to be told
IllegalStateException, which can't be thrown by the onNext call, as that
would make the subscriber violate the specification.
- A GzipInflator processor receives a tiny message that inflates to an
enormous buffer. Either the OOME or preferably some limit violation
exception should be sent to the source of the too large message.
- A SerializingProcessor that receives an object that is not
serializable. Again this is an error of the publisher not the processor.
> " why would the database need to know that some downstream component
choked on the data it emitted?"
In the example that I described, the database would not need to know about
the exception and would be perfectly able to ignore the associated
reason. However, note that in that example there was an
ApplicationProcessor between the JsonProcessor and the database
publisher. In that example it is probably the fault of the Application
processor that forwarded a DB object onto the Jsonprocessor when it should
not have. A good implementation of the application would intercept the
resulting JSONException in the application processor and just pass on
either null or some ApplicationException to the database publisher.
> " Arguably the data exist and are “correct” by definition"
Database developers are not beyond making errors, specially when
implementing new asynchronous reactive stream clients. Just because the
database calls onNext does not mean that it is correctly implementing the
RS specification. Just because it provides an object does not mean that
object entirely correct. A database can be corrupted and could send an
object containing a string with illegal unicode points for example.
In general If a subscriber has some assumptions about the data a publisher
should emit and those assumptions are not meet, then either the subscriber
is wrong or the publisher is wrong - but it is beyond the scope of the API
to resolve who is at fault, so best to tell both parties and let them each
work out who is to blame. Assuming that publishers are always faultless
does not reflect reality.
> "Data are flowing towards the Servlet (destined for whichever client made
the request) and it is important to signal abnormal termination differently
from normal termination, hence the onError propagation in this direction."
I can bounce your earlier question back at you - why would a
HTTPResponseSubscriber need to know the specific reason that some upstream
component chocked before completing emitting all its data? Mostly the
subscribers don't really care and just need to know a reason, and just need
to know if the termination was abnormal or not, so onError() would do
just as well as onError(Throwable).... except in the few special cases when
it does matter and perhaps the HTTPResponseSubscriber wants to look at the
exception to decide if it should send a 500, 503, 403, an chunk trailer,
This type of argument applies equally to upstream and downstream. For the
most part the reason throwable is ignorable in both directions, however
there are always going to be special cases when it is meaningful.
I content that it is better to ignore a meaningless reason than to be
ignorant of a meaningful one.
On 24 July 2015 at 17:12, Roland Kuhn <rk at rkuhn.info> wrote:
> Hi Greg,
> my reply has obviously opened two different discussions (namely “why are
> things as they are?” and “what is the suggested change all about”), I think
> it would be most fruitful if we stash the first one for now and come back
> to it after the second one has been understood better—at least by myself.
> That will put us into a better situation for judging the big picture.
> Considering the flow of data from the DB via application and framework
> processors into the Servlet container, at any point along this line
> failures can happen. The component emitting the failure will use whatever
> means it has outside of Reactive Streams to log/audit/monitor and provide
> metrics, I assume that that is just part of all reasonable code; the
> database will do that, the application will do it, the framework will
> probably allow the application to configure how to do that, and the
> application server will be configured how to do that. This means that
> everyone can debug their own failures.
> Data are flowing towards the Servlet (destined for whichever client made
> the request) and it is important to signal abnormal termination differently
> from normal termination, hence the onError propagation in this direction.
> This also allows downstream components to see failures coming from
> upstream, but this is a byproduct of needing to generate the right kind of
> final response to the external client. Now the interesting question is: why
> would the database need to know that some downstream component choked on
> the data it emitted? How exactly would this information be used by the
> database or its operators/programmers? Arguably the data exist and are
> “correct” by definition, guarded by Java types, and any validation errors
> that occur are not stream failures (cf. this definition
> <http://www.reactivemanifesto.org/glossary#Failure>) and should be
> treated as normal data elements and sent downstream (or filtered out,
> depending on the requirements & protocol).
> I am deliberately painting with high contrast colors here in order to
> better understand what exactly it is that you want to achieve instead of
> just discussing the proposed solution, thanks for your patience!
> 24 jul 2015 kl. 08:31 skrev Greg Wilkins <gregw at webtide.com>:
> thanks for the response.
> But I don't understand why you consider a terminal exception being
> notified upstream as a data flow? It is data, but it is not a flow
> because it is terminal and cannot be used as a back channel.
> Implementations of the API are already required to send data upstream:
> Cancellation is a terminal boolean data state that must be sent upstream,
> and request(int) is a flow of integers that must be sent upstream [and as
> an aside, it is not beyond imagination that request(int) will be misused as
> a back channel for data - hey it might even get used to send an error code
> immediately prior/post to a cancel! ]
> Thus I don't see that there is any significant additional complexity with
> that cancellation having a reason associated with it. Implementations
> must already support upward bound data and any sequencing and/or race
> conditions that exist with cancel(Throwable) also exist with just cancel().
> I also dispute that a Subscriber will be under the control of the
> Publisher. In the example cited and application is providing a
> Processor, that is using a Publisher provided by a 3rd party database and
> an Subscriber provided by the Servlet container, with perhaps some
> framework provided Processors for serialization. In this example there is
> the possibility of components from at least 4 difference code sources being
> combined in a chain that crosses deployment administration boundaries of:
> database, application and server. The log & cancel handling of errors
> is going to be very difficult because many different log mechanism may be
> in use and access may not be easily achieved. ie applications developers
> may not have full viability of database logs or servlet container logs.
> The type of error I'm concerned about are all terminal style errors and
> not intended to be a back flow of data, nor acknowledgement of messages
> sent. It is probably that the implementers of cancel(Throwable) would
> just log, cancel themselves and pass on the cancel(Throwable) to any of
> their Subscripions. However the point being that would allow the reason
> for the failure to cross the administrative boundaries so that it can be
> known to all.
> I think that any argument that can be made for not sending a Throwable
> upstream can equally be made for not sending one downstream (or for not
> having any exceptions in the java language). Exceptions are very rarely
> handled in any meaningful way, but are extremely useful for passing details
> of a failure so that they may be known to all who may need to know.
> Without exceptions I'm imagining many many stack over flow questions
> like "Why was my Subscription cancelled?" followed by obligatory "RTFLog
> Stupid!" responses!
> On 24 July 2015 at 15:55, Roland Kuhn <rk at rkuhn.info> wrote:
>> Hi Greg,
>> the reasoning behind the asymmetric RS design is that this communication
>> primitive targets unidirectional communication, bidirectional conversations
>> would utilize two such streams running in opposite directions. This means
>> that for a single stream data elements (of which onError is but a special
>> one) flow downstream and only demand flows upstream. Publishers only need
>> to know about when and if to produce the next element(s), hence we didn’t
>> see a use-case for propagating more information than “N elements needed”
>> and “no more elements needed”.
>> If a single Reactive Stream could transport data upstream then we would
>> need to implement back-pressure on that back channel as well, leading to
>> the same complexity as having two RS running in opposite directions.
>> Another reason why we made this separation lies in not burdening the API
>> designers of conforming implementations with an impossible task: the
>> combinators offered on stream transformation APIs flow with the (English)
>> language from left to right and describe sequences of transformation stages
>> but with data flowing upstream there would be the need for also describing
>> how to handle that—even if it is “only” an error channel—and since these
>> data flow in the opposite direction there would be no natural way to write
>> this down.
>> Learning about the reason behind cancellation seems geared towards
>> recovery in the sense that the Publisher would then construct and attach a
>> different Subscriber afterwards—please let me know if you have something
>> else in mind—and if you want to do that then the Subscriber will in any
>> case be under the Publisher’s control and can use a different channel to
>> communicate the onError signal back to the data source. Since that channel
>> would transport data it would be a separate one flowing in the opposite
>> direction as mentioned above, at least conceptually; with a single element
>> like you describe it could well be a simpler callback mechanism and might
>> not need full back-pressure.
>> I hope this clarifies some of the background behind the RS design. Please
>> share more of your intended use of an error back-channel so that we can
>> understand what exactly the upstream components would do with that data in
>> the example case you mention.
>> 24 jul 2015 kl. 00:35 skrev Greg Wilkins <gregw at webtide.com>:
>> On 24 July 2015 at 00:23, Doug Lea <dl at cs.oswego.edu> wrote:
>>> * Reactive-stream users may be disappointed that we do not include any
>>> net/IO-based Flow.Publisher/Subscriber classes, considering that
>>> reactive-streams are mainly motivated by net-based frameworks. The
>>> reasons for triaging these out are that (1) IO generally falls outside
>>> of java.util.concurrent (2) Most net-based frameworks seem to use
>>> custom data representation etc (e.g., JSON) that are even further out
>>> of scope. However class SubmissionPublisher can be used as an adaptor
>>> to turn just about any kind of source into a Publisher, so provides a
>>> nearly universal way of constructing a good non-custom Publisher even
>>> from IO-based sources. (Also notice that SubmissionPublisher can
>>> serve as the basis of other actor-like frameworks, including those
>>> turning off back-pressure by calling
>>> subscription.request(Long.MAX_VALUE) in onSubscribe).
>> Doug et al,
>> The Jetty project has been experimenting with the reactive streams API:
>> https://github.com/jetty-project/jetty-reactive albiet not with the
>> JDK-9 version of it, but inspired by the proposed inclusion of it.
>> We very much like the API and what it can bring to our space. We don't
>> see that it needs direct IO support and that it's power is actually
>> bridging domains with a good asynchronous model that supports flow
>> We've also begun some preliminary discussions about developing RS based
>> proposal for the Servlet 4.0 specification. Currently the Servlet API
>> does well support asynchronous IO and behaviour, but the API is deceptively
>> difficult to use correctly and gives no support for back pressure. With
>> RS's we can envisage solutions that look like:
>> - A database provides a RS Producer that provides the large results
>> of a query asynchronously from a remote database server
>> - Some business logic is encapsulated as a RS Processor subscribed to
>> the database producer
>> - Some framework provided Porocessors subscribe to the business
>> logic Processor to perform a chain of functions such as serialization,
>> - A container provided Subscriber terminates the chain and sends the
>> resulting byte out over HTTP/HTTP2 or Websocket. The flow control
>> mechanisms of these protocols would be the basis of the RS back pressure.
>> In such solutions, a full HTTP/2 flow control window would result in back
>> pressure on the remote database server, allowing threadless waiting without
>> unlimited queuing of data.
>> However, we have a significant concern with the API in that we do not
>> like it's error handling design. Specifically that it is asymmetric and an
>> error in the middle of a chain of processors can be propagated downstream
>> with onError(Throwable) but can only be propagated upstream with cancel().
>> We believe that cancel without reason is an insufficient semantic to
>> build a robust ecosystem of RS Processors that can be used to build
>> applications. Consider the above example, it would be ideal if the object
>> serialization was handled by a 3rd party Processor (let's say
>> JSONEncodingProcessor). If the business logic erroneously sent an
>> non-jsonable object, or if the JSON converter was incorrectly configured
>> then the JSONEcondiingProcessor could encounter an error during its
>> onNext(Object item) handling and it's only permitted handling of that is to
>> cancel the stream, without explanation.
>> I have raised this as an issue on the RS github and it the current
>> recommendation is to log and cancel:
>> However I believe that log and cancel is a insufficient semantic. Logging
>> in assembled applications is often fraught as each component provider will
>> fight over which logging framework is best. RS chains may cross
>> jurisdictional boundaries and logs may not even be readily available.
>> The solution we see is to replace/augment cancel() with either
>> cancel(Throwable reason) or an upstream onError(Throwable reason). I
>> acknowledge that the passed reason may not always be meaningful to the
>> upstream processors and publishers, but it is better to ignore a
>> meaningless reason than to be ignorant of a meaningful one.
>> When considering this API, we have to look beyond usages that work well
>> and consider usages that will fail well also!
>> Greg Wilkins <gregw at webtide.com>
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> I'm a physicist: I have a basic working knowledge of the universe and
>> everything it contains!
>> - Sheldon Cooper (The Big Bang Theory)
> I'm a physicist: I have a basic working knowledge of the universe and
> everything it contains!
> - Sheldon Cooper (The Big Bang Theory)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest