[concurrency-interest] Coroutine-like control transfer?

Charles Oliver Nutter headius at headius.com
Wed Aug 28 17:36:18 EDT 2013

I've been struggling for years to find the right (i.e. fastest) way to
do a coroutine-like control transfer between two threads, and I
figured I'd try asking here again...

What I need to emulate is a true coroutine, where a thread transfers
execution (ip and stack) to a different piece of code, that code runs
for a while, and then execution transfers back. There's always exactly
one piece of code executing at a given time, since there's literally
only one thread bouncing between stacks.

On the JVM, where we don't have real coroutines, the best we can do is
to emulate this with threads (ignoring bytecode-weaving stack tricks
like Kilim for the moment). I do not believe any of the structures in
j.u.concurrent currently have this exact pattern in mind.

The patterns I've used to emulate this:

1. Explicit park/unpark.

The parent thread starts up the coroutine, which immediately parks
itself. Parent thread wakes up coroutine by unparking it and giving it
an initial value, at which point parent parks itself. Child runs for a
bit, then unparks parent, gives it a value, and parks itself.

This is logically closest to what I want, but the park/unpark
operations are too expensive for fast transfers. This mechanism ended
up being the slowest way when measuring raw transfer rate (i.e. very
little work being done between transfers)

2. SynchronousQueue

Instead of using explicit parking and unparking, the parent pushes a
value on child's sync queue, and then waits on its own sync queue.
Child signals parent by pushing a value on parent's sync queue and
then waits on its own.

This was about 3x faster than explicit park/unpark.

3. Exchanger

Substitute Exchanger for SynchronousQueue, where the "take" operation
just exchanges null and the "put" operation ignores the result. This
was the fastest...around 15x faster than explicit park/unpark and 5x
faster than SynchronousQueue

4. Just spin

Do nothing but spin on a null volatile field waiting for it to become
non-null. Another 3x faster than Exchanger.


It seems like I'm building a conceptually simple pattern on top of
structures not designed for it (or rather, designed for more complex
patterns). In SynchronousQueue's case, I have to do two put/take
operations for every transfer. With Exchanger, two exchanges. What I
really want is something like LockSupport.park/unpark that's more like
LockSupport.parkMeAndUnparkHimPassingValue(value), explicitly passing
control and a value to the other thread.

What am I missing? Do I need to hand-roll this?

- Charlie

More information about the Concurrency-interest mailing list