[concurrency-interest] Idempotent work stealing (2009)
kimo at webnetic.net
Mon Dec 28 15:39:59 EST 2009
This paper was interesting to me when it came out, both for the different
approach taken for work stealing and for the emphasis on not using atomic
instructions like CAS to improve speed..
Has there been any discussion on this list about implementing these
algorithms for the j.u.c.
(I'm surprised to not find a Search function for the archives of this list)
*Idempotent work stealing (2009)*
Maged M. Michael<http://portal.acm.org/author_page.cfm?id=81332515587&coll=GUIDE&dl=GUIDE&trk=0&CFID=70118550&CFTOKEN=52758888>
Thomas J. Watson Research Center, Yorktown Height, NY, USA Martin T.
Thomas J. Watson Research Center, Hawthorne, NY, USA Vijay A.
Thomas J. Watson Research Center, Hawthorne, NY, USA
Load balancing is a technique which allows efficient parallelization of
irregular workloads, and a key component of many applications and
parallelizing runtimes. Work-stealing is a popular technique for
implementing load balancing, where each parallel thread maintains its own
work set of items and occasionally steals items from the sets of other
The conventional semantics of work stealing guarantee that each inserted
task is eventually extracted exactly once. However, correctness of a wide
class of applications allows for relaxed semantics, because either: i) the
application already explicitly checks that no work is repeated or ii) the
application can tolerate repeated work.
In this paper, we introduce *idempotent work tealing*, and present several
new algorithms that exploit the relaxed semantics to deliver better
performance. The semantics of the new algorithms guarantee that each
inserted task is eventually extracted *at least* once-instead of *exactly*
*For a wide range of applications dealing with irregular computation
patterns. Sample domains include: parallel garbage collection, fixed point
computations in program analysis, constraint solvers (e.g. SAT solvers),
state space search exploration in model checking as well as integer and
mixed programming solvers.*
On mainstream processors, algorithms for conventional work stealing require
special atomic instructions (CAS) or store-load memory ordering fence
instructions in the owner's critical path operations. In general, these
instructions are substantially slower than regular memory access
instructions. By exploiting the relaxed semantics, our algorithms avoid
these instructions in the owner's operations.
We evaluated our algorithms using common graph problems and micro-benchmarks
and compared them to well-known conventional work stealing algorithms, the
THE Cilk and Chase-Lev algorithms. We found that our best algorithm (with
LIFO extraction) outperforms existing algorithms in nearly all cases, and
often by significant margin
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest