[concurrency-interest] CompletableFuture in Java 8

√iktor Ҡlang viktor.klang at gmail.com
Fri Dec 5 08:41:34 EST 2014


Hey Josh,

On Fri, Dec 5, 2014 at 5:30 AM, Josh Humphries <jh at squareup.com> wrote:
>
> Hey, Viktor. I think I've touched on some of this already. But, since you
said you're very much interested, I'll elaborate on my thinking.
>

Thanks for taking the time and spending the effort to elaborate, Josh, I
really appreciate it!

Rereading my reply I notice that we have strayed a bit from the initial
discussion, but it is an interesting topic so I'll share my thoughts on the
topic.

TL; DR: I think we both agree but have different cutoff points. :)

>
>
> Every decision is a trade-off. Mixing concerns can bloat the API and
increase the cognitive burden of using it, but it can also provide greater
functionality or make certain patterns of use easier. While the two
concerns we're discussing may seem similar, they are very different (at
least to me) regarding what they are actually providing to the developer,
so the trade-offs are different.

Agreed. My stance is to error on the side of Single Responsibility
Principle, and it is easier to add API if defensible than deprecate and
remove (remove has never happened in the JDK AFAICT).

>
>
>
> Concern #1: Exposing methods to imperatively complete the future vs.
having the future's value be provided implicitly (by the running of some
unit of logic). We're not really talking about mixing the two here. My
objection was that CompletionStage#toCompletableFuture leaks the imperative
style in a way that is simply inappropriate.

I think both I and Doug(?) agree here, the problem is that there's no
protected scope for interface methods, so CompletableFuture would have to
wrap every CompletionStage that isn't a CompletableFuture, leading to a lot
of allocations for the worst case. Doug would be able to share more about
that.

>
> So my objection here is about poor encapsulation/abstraction. If the API
had returned FutureTask, that too would have been bad. (I also griped about
the lack of a FutureTask-like implementation of CompletionStage, but that
is really a nit; not a major complaint.)

Personally I don't mind here, it's beyond trivial to "submit" a
CompletableFuture. But YYMV.

And with CompletableFuture you have a choice if you want to expose it,
CompletionStage or Future depending on what capabilities you want to
expose, which does sound quite flexible?

>
>
> As far as inter-op with legacy APIs, a #toFuture() method would have been
much better for a few reasons:

I think again that the toCompletableFuture, as far as I can see, was
primarily needed for CompletableFuture.

>
> Future is an interface, so a *view* could be returned instead of having
to create a new stateful object that must be kept in sync with the
original.
>
> Future provides inter-op but doesn't leak complete*/obtrude* methods (the
heart of my objections)
> It could have been trivially implemented as a default method that just
returns a CompletableFuture that is set from a #whenComplete stage, just as
you've described.
> (I'm pretty certain we agree on #1. At least most of it.)

I would have much preferred to have a static method on Future called
"fromCompletionStage" so that CompletionStages do not need to know about
"the world". :-)

>
>
>
> Concern #2: Exposing non-blocking mechanisms to consume/use a future
value vs. blocking mechanisms. Mixing these is a very different beast from
the above. It isn't poor encapsulation/abstraction as neither has anything
to do with how the value is actually produced.

This I completely agree with, with the exception that I see mixing APIs
(blocking APIs and non-blocking APIs) as mixing concerns.

>
> Instead, one is a simple derivative of the other (e.g. given a
non-blocking mechanism, a blocking mechanism can always be implemented on
top of it).

Yes, and as such, I argue that if one wants to block, one can always do
that when given non-blocking APIs. Especially in the face of
Future.fromCompletionStage.

>
> Providing both facilitates simple, synchronous usage where it's
appropriate (without boiler-plate) and asynchronous non-blocking usage
where it isn't.

I think the problem here, for me, is "when appropriate". I'd argue that it
is so rarely appropriate that making it more of a hassle is worth it.
Blocking needs to be justified as it can lead to liveness problems (and
performance problems).

>
>
> You've already clearly expressed the opinion that blocking code is never
appropriate. I think that's a reasonable assertion for many contexts, just
not the JRE. Avoiding blocking altogether in core Java APIs is neither
realistic nor (again, IMO) desirable.

Would you mind expanding on "just not the JRE"?
My view is that java.util.concurrent is about tools to facilitate
concurrent programming mainly targeted towards advanced users and library
writers.
Perhaps it is here we have different views?

>
>
> BTW, a #toFuture method, as I described above, would have been fine with
me as a means of achieving the properties I want in the API. It allows
blocking consumption without boiler-plate (stage.toFuture().get()) and it
does not leak implementation details of the Future to clients.

def toFuture[T](stage: CompletionStage[T]): Future[T] =
stage.toCompletableFuture()

Now it doesn't leak? (I would have preferred to not have the "to"-method on
CompletionStage at all)

>
>
>
> (Long-winded tangent ahead. Apologies if it sounds like a lecture.)

Don't worry about it, I'll take lectures any day over silence :)

>
>
> There is a spectrum. On one end (let's call it "simple"), you want a
programming model that makes it easier to write correct code and that is
easy to read, write, understand, and troubleshoot
>
> (at the extreme: all synchronous, all blocking flows -- very simple to
understand but will often have poor performance and is incapable of taking
advantage of today's multi-core computers). On the other end
("performance"), you want a programming model that enables taking maximum
advantage of hardware, provides greater efficiency, and facilitates better
performance (greater throughput, lower latency).

I think you may be conflating "simple" and "easy":
http://www.infoq.com/presentations/Simple-Made-Easy

To me, personally, it is mostly about performance, because that's what I
need. But for my users, it is important that one can reason about how the
code will behave.

I'll argue that async monadic-style programming is -simpler- than the
blocking equivalent, yes, it may sound extremely weird at first thought but
hear me out:

Let's take these two sections of code:

def addSync(f1: j.u.c.Future[Int], f2: j.u.c.Future[Int]): Int = f1.get() +
f2.get()

Questions:
1) When is it safe to call `addSync`?
2) How do I, as the caller of `addSync` know when it is safe to call
`addSync`?
3) Will my program be able to run to completion if I call `addSync`?
4) How much code do I need to change if `addSync` causes performance or
liveness problems?

def addAsync(f1: AsyncFuture[Int], f2: AsyncFuture[Int))(implicit e:
Executor): AsyncFuture[Int] = f1 zip f2 map {_ + _}

Questions:
1) When is it safe to call `addAsync`?
2) How do I, as the caller of `addAsync` know when it is safe to call
`addAsync`?
3) Will my program be able to run to completion if I call `addAsync`?

In my experience (as a contributor to Akka for 5 years, and the co-author
of Futures & Promises for Scala), the biggest risk with adding blocking
APIs (Akka Futures had blocking APIs -on- the Future itself, Scala has it
externally on a utility called Await, which fortunately employs managed
blocking to try to reduce the risk of liveness problems at the expense of
performance) I can safely say that most people will fall back on what they
know, if that is easier (less effort) than learning something new. There's
nothing -wrong- about that, it's just human nature! However, knowing that,
we must take that into consideration and make it easier (less of an effort)
to learn new things, especially if it leads to better programs
(maintainability, performance etc).

When the blocking methods were built into the Future itself (in Akka
originally), it was one of the biggest sources of problems reported
(related to Futures).
When the blocking methods were externalized (in scala.concurrent), it is
still one of the biggest sources of problems reported (related to Futures).

Again, this is just my experience on the topic, so YMMV!

>
>
> If we're being pragmatic, economics is the real decider for where on the
spectrum will strike the right balance. On the simple side, there's an
advantage to spending less in engineer-hours: most developers are more
productive writing simple synchronous code, and that style of code is much
easier to debug. But this can incur greater capital costs since it may
require more hardware to do the same job. On the performance side, it's the
converse: get more out of the money spent on compute resources, but
potentially spend more in engineering effort. (There are obviously other
constraints, too, like whether a particular piece of software has any value
at all if it can't meet certain performance requirements.)

I understand and can sympathize with this view. But I think that it is more
complex than that, it is essentially trading away quick short term gain for
long-term loss. (Debugging deadlocks and concurrency issues more than often
cost more in developer time, and disrupting production systems than the
gain in initially writing the code.)

"It is easier to serve people desserts than greens, the question is what is
healthier." :)

>
>
> My experience is that most organizations find their maxima close to the
middle, but nearer to the simple side. So there is an economic advantage
for them to focus a little more on developer productivity than on software
efficiency and performance.

For the user, I worry more about correctness and liveness than performance.
The performance concerns is for me as a library writer (as the users can't
opt out of bad library performance).

>
>
> I want my users to be as productive as possible, even if it means they
write blocking code. (And, let's face it, some of them will commit
atrocities far worse than just using a blocking API.)

I understand this line of reasoning but it always has to be qualified.
For example, the allure of RoR was that of high initial productivity, but
what was sacrificed was both performance, maintainability and scalability.
So we need to not only consider short term "gains" but also long term
"losses".


>
>
>
> (A bit of an aside: I know it's from 2008, but still relevant:
http://www.mailinator.com/tymaPaulMultithreaded.pdf)
>
>
>
> ----
> Josh Humphries
> Manager, Shared Systems  |  Platform Engineering
> Atlanta, GA  |  678-400-4867
> Square (www.squareup.com)
>
> On Thu, Dec 4, 2014 at 1:13 PM, √iktor Ҡlang <viktor.klang at gmail.com>
wrote:
>>
>>
>>
>> On Thu, Dec 4, 2014 at 7:11 PM, Josh Humphries <jh at squareup.com> wrote:
>>>>
>>>>
>>>>> And I think Josh's point that blocking (invoking get()) is
*orthogonal* to
>>>>> the ability for a reader/consumer to *write* the value of a
computation,
>>>>
>>>>
>>>> Now that I think we can all agree on. But that was not how he phrased
it AFAICT.
>>>
>>>
>>> Admittedly, I didn't use exactly that phrase. But that is precisely
what I meant when I wrote this:
>>>
>>> "So to me, splitting imperative completion and task-based implicit
completion into different interfaces is a different concern than splitting
blocking and non-blocking forms of consumption."
>>
>>
>> Thanks for clarifying. What I commented on was that mixing concerns
seemed appropriate in one case, and discouraged in the other, without any
rationale as to why that was OK for one thing but not the other. (I'm still
very much interested in this)
>>
>>
>>
>>
>> --
>> Cheers,
>>>
>

-- 
Cheers,
√
Hey, Viktor. I think I've touched on some of this already. But, since you
said you're very much interested, I'll elaborate on my thinking.


Every decision is a trade-off. Mixing concerns can bloat the API and
increase the cognitive burden of using it, but it can also provide greater
functionality or make certain patterns of use easier. While the two
concerns we're discussing may seem similar, they are very different (at
least to me) regarding what they are actually providing to the developer,
so the trade-offs are different.


Concern #1: Exposing methods to imperatively complete the future vs. having
the future's value be provided implicitly (by the running of some unit of
logic). We're not really talking about mixing the two here. My objection
was that CompletionStage#toCompletableFuture leaks the imperative style in
a way that is simply inappropriate. So my objection here is about poor
encapsulation/abstraction. If the API had returned FutureTask, that too
would have been bad. (I also griped about the lack of a FutureTask-like
implementation of CompletionStage, but that is really a nit; not a major
complaint.)

As far as inter-op with legacy APIs, a #toFuture() method would have been
much better for a few reasons:

   1. Future is an interface, so a *view* could be returned instead of
   having to create a new stateful object that must be kept in sync with the
   original.
   2. Future provides inter-op but doesn't leak complete*/obtrude* methods
   (the heart of my objections)
   3. It could have been trivially implemented as a default method that
   just returns a CompletableFuture that is set from a #whenComplete stage,
   just as you've described.

(I'm pretty certain we agree on #1. At least most of it.)


Concern #2: Exposing non-blocking mechanisms to consume/use a future value
vs. blocking mechanisms. Mixing these is a very different beast from the
above. It isn't poor encapsulation/abstraction as neither has anything to
do with how the value is actually produced. Instead, one is a simple
derivative of the other (e.g. given a non-blocking mechanism, a blocking
mechanism can always be implemented on top of it). Providing both
facilitates simple, synchronous usage where it's appropriate (without
boiler-plate) and asynchronous non-blocking usage where it isn't.

You've already clearly expressed the opinion that blocking code is never
appropriate. I think that's a reasonable assertion for many contexts, just
not the JRE. Avoiding blocking altogether in core Java APIs is neither
realistic nor (again, IMO) desirable.

BTW, a #toFuture method, as I described above, would have been fine with me
as a means of achieving the properties I want in the API. It allows
blocking consumption without boiler-plate (stage.toFuture().get()) and it
does not leak implementation details of the Future to clients.


(Long-winded tangent ahead. Apologies if it sounds like a lecture.)

There is a spectrum. On one end (let's call it "simple"), you want a
programming model that makes it easier to write correct code and that is
easy to read, write, understand, and troubleshoot (at the extreme: all
synchronous, all blocking flows -- very simple to understand but will often
have poor performance and is incapable of taking advantage of today's
multi-core computers). On the other end ("performance"), you want a
programming model that enables taking maximum advantage of hardware,
provides greater efficiency, and facilitates better performance (greater
throughput, lower latency).

If we're being pragmatic, economics is the real decider for where on the
spectrum will strike the right balance. On the simple side, there's an
advantage to spending less in engineer-hours: most developers are more
productive writing simple synchronous code, and that style of code is much
easier to debug. But this can incur greater capital costs since it may
require more hardware to do the same job. On the performance side, it's the
converse: get more out of the money spent on compute resources, but
potentially spend more in engineering effort. (There are obviously other
constraints, too, like whether a particular piece of software has any value
at all if it can't meet certain performance requirements.)

My experience is that most organizations find their maxima close to the
middle, but nearer to the simple side. So there is an economic advantage
for them to focus a little more on developer productivity than on software
efficiency and performance.

I want my users to be as productive as possible, even if it means they
write blocking code. (And, let's face it, some of them will commit
atrocities far worse than just using a blocking API.)


(A bit of an aside: I know it's from 2008, but still relevant:
http://www.mailinator.com/tymaPaulMultithreaded.pdf)



----
*Josh Humphries*
Manager, Shared Systems  |  Platform Engineering
Atlanta, GA  |  678-400-4867
*Square* (www.squareup.com)

On Thu, Dec 4, 2014 at 1:13 PM, √iktor Ҡlang <viktor.klang at gmail.com> wrote:

>
>
> On Thu, Dec 4, 2014 at 7:11 PM, Josh Humphries <jh at squareup.com> wrote:
>
>>
>>> And I think Josh's point that blocking (invoking get()) is *orthogonal*
>>>> to
>>>> the ability for a reader/consumer to *write* the value of a computation,
>>>>
>>>
>>> Now that I think we can all agree on. But that was not how he phrased it
>>> AFAICT.
>>>
>>
>> Admittedly, I didn't use exactly that phrase. But that is precisely what
>> I meant when I wrote this:
>>
>> "So to me, splitting imperative completion and task-based implicit
>> completion into different interfaces is a different concern than splitting
>> blocking and non-blocking forms of consumption."
>>
>
> Thanks for clarifying. What I commented on was that mixing concerns seemed
> appropriate in one case, and discouraged in the other, without any
> rationale as to why that was OK for one thing but not the other. (I'm still
> very much interested in this)
>
>
>
>
> --
> Cheers,
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20141205/c5a36cf7/attachment-0001.html>


More information about the Concurrency-interest mailing list