[concurrency-interest] Coroutine-like control transfer?

Zhong Yu zhong.j.yu at gmail.com
Thu Aug 29 10:39:36 EDT 2013

On Wed, Aug 28, 2013 at 4:36 PM, Charles Oliver Nutter
<headius at headius.com> wrote:
> I've been struggling for years to find the right (i.e. fastest) way to
> do a coroutine-like control transfer between two threads, and I
> figured I'd try asking here again...
> What I need to emulate is a true coroutine, where a thread transfers
> execution (ip and stack) to a different piece of code, that code runs
> for a while, and then execution transfers back. There's always exactly
> one piece of code executing at a given time, since there's literally
> only one thread bouncing between stacks.
> On the JVM, where we don't have real coroutines, the best we can do is
> to emulate this with threads (ignoring bytecode-weaving stack tricks
> like Kilim for the moment). I do not believe any of the structures in
> j.u.concurrent currently have this exact pattern in mind.
> The patterns I've used to emulate this:
> 1. Explicit park/unpark.
> The parent thread starts up the coroutine, which immediately parks
> itself. Parent thread wakes up coroutine by unparking it and giving it
> an initial value, at which point parent parks itself. Child runs for a
> bit, then unparks parent, gives it a value, and parks itself.
> This is logically closest to what I want, but the park/unpark
> operations are too expensive for fast transfers. This mechanism ended
> up being the slowest way when measuring raw transfer rate (i.e. very
> little work being done between transfers)
> 2. SynchronousQueue
> Instead of using explicit parking and unparking, the parent pushes a
> value on child's sync queue, and then waits on its own sync queue.
> Child signals parent by pushing a value on parent's sync queue and
> then waits on its own.
> This was about 3x faster than explicit park/unpark.
> 3. Exchanger
> Substitute Exchanger for SynchronousQueue, where the "take" operation
> just exchanges null and the "put" operation ignores the result. This
> was the fastest...around 15x faster than explicit park/unpark and 5x
> faster than SynchronousQueue
> 4. Just spin
> Do nothing but spin on a null volatile field waiting for it to become
> non-null. Another 3x faster than Exchanger.
> ...
> It seems like I'm building a conceptually simple pattern on top of
> structures not designed for it (or rather, designed for more complex
> patterns). In SynchronousQueue's case, I have to do two put/take
> operations for every transfer. With Exchanger, two exchanges. What I
> really want is something like LockSupport.park/unpark that's more like
> LockSupport.parkMeAndUnparkHimPassingValue(value), explicitly passing
> control and a value to the other thread.

It seems that the more spin there is, the faster it is; or, the sooner
you do park(), the slower it is. If you do full spin, 50% CPU is
wasted, the throughput would be half of the true coroutine solution -
which is probably not too bad.

> What am I missing? Do I need to hand-roll this?
> - Charlie
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

More information about the Concurrency-interest mailing list