[concurrency-interest] Splitting long running jobs into short ones (Task-type affinity in thread pools?)

Peter Kovacs peter.kovacs.1.0rc at gmail.com
Fri Feb 23 17:33:58 EST 2007


Hi,

I have a potentially long running iterative operation which itself
takes its input by iterating on a producer (the producer and the
operation in question [the consumer] implement some kind of iterator
interface). Each iteration on the producer provides an input which is
independent of the inputs provided in previous and subsequent
iterations -- so processing of inputs is also independent.

My initial approach have been to split the long running consumer
operation into short tasks where inputs from each iteration are
processed in a Callable executed by a thread taken from a thread pool.
What makes this difficult to implement is that processing of the
independent inputs requires an object which is very expensive to
construct and this object has to be dedicated to the short-running
Callable for the duration of Callable.call. (An obvious source of lock
contention.)

The appeal of splitting the long running consumer operation into
short-running tasks is that execution can then be tuned through tuning
the thread pool(s) to which the short-running tasks are assigned. The
alternative approach of starting a number of long running threads with
their dedicated "heavy-to-construct" objects requires the introduction
of another mechanism for determining the number of threads to start.

Is there a way to deal efficiently with heavy-to-construct object in
the split-into-short-running task scenario? (pooling? see PS)

What would be a good mechanism for long running operations
(essentially a way to tune the number of threads dedicated to the
operations)?

Thanks
Peter

PS:
My initial approach to pooling the heavy object have been to use a
ConcurrentHashMap as the pool from which the short-running tasks
retrieve the heavy-to-construct object using Thread.currentThread() as
the key (the object is created, if not yet mapped to the thread).

While this approach is not too bad with small fixed thread pools, it
doesn't scale up very well. With a potentially large number of other
indepent operations/tasks in the same thread management system and
different task types randomly assigned to threads, there will be an
increasing tendency toward constructing much more of the "heavy
objects" required for this particular operation than will be actually
used (one "heavy object" for each thread in the thread pool).
Depending on the actual length of the "long-runnning" operation, this
could be counter-productive. (This is a *potentially* long-running
operation. It can also be, say, 1000 iterations -- which normally take
2-4 seconds to process, if just one heavy object needs to be created.
If there happens to be 1000 threads in the pool, there is the
possibility that such an object should be constructed for each
iteration [short-running task]. Which is bad.)

A way out of the "heavy-object-spreading" problem would be, if thread
pools had a "task-type affinity" feature: if a task of a given type is
requesting a thread from the pool, one is preferably given which has
recently been used by a task of the same type.

Another way out would be to replace the current thread's reference as
the key for ConcurrentHashMap. I can think, e.g., of the following
scheme:
(1) generate with a cheap RNG (xorShift?) a number between 0 and 16
(2) use this number to retrieve a LinkedList from the ConcurrentHashMap
(3) use the LinkedList as a pool.

But this starts to look awsome, isn't it?


More information about the Concurrency-interest mailing list