[concurrency-interest] Target CPU utilization
peter.kovacs.1.0rc at gmail.com
Fri Feb 16 05:40:24 EST 2007
Thank you for your advice, David! Honestly, I haven't thought about
this problem from the response time perspective. One of the reasons
may be that my performance requirements are mainly centered around
throughput. But there will certainly be applications where response
time will be important.
This leads to another interesting part of my task. The library itself
I am working with is layered. More elementary operations (like
aromatic ring discovery) are used in throughput-minded batch-style
jobs (like generating fingerprints during database import of compounds
-- fingerprints include the number of aromatic rings). But the more
elementary operations are also used in interactive GUI applications
such is in "aromatizing" a displayed chemical structure -- where
response time has a clear priority over throughput.
In addition to the response time vs. throughput dilemma, the question
that really intrigues me about this layered aspect is the following.
Batch-style jobs are coarsely grained operations and are relatively
easy to make parallel. More elementary operations are finely grained
and typically more difficult to make parallel. (The more elementary
operations are more difficult to make parallel partly due to
algorithmic constraints resulting from the nature of the problem they
solve, partly because of their different scale of performance: their
execution times are much more closely comparable to those of the
elementary platform operations, hence they are more sensitive to the
extra overhead associated with adding multithreaded capability.) The
obvious approach to getting ready for "Amdahl's era of computing" is
to start with making parallel the batch category of operations. But
should I not make my multithreading infrasturcture already prepared
for 800 systems. And if I should, what should the underlying thread
management model look like? Should I use for a highly layered library,
the traditional "flat" model of side-by-side thread-pools with
(internal) client code picking the thread-pool best fitting for its
work category? Shouldn't there be a "vertical" coordination between
layers in terms thread management/multithreading for performance (or
In fact, in my particular case, some of the "more elementary
operations" (such as a complex chemical reaction) can quickly scale up
through a combination of increased "problem size" and increased
problem complexity into a work unit which already warrants the
introduction of multi-threaded capability into the more elementary
operation (which is not so elementary any more) -- even on two- or
I would like to give you a specific example of the layered structure I
have to deal with, but I'd rather not for fear of breaking some
confidentiality rule. The point is that I have at least 7 or 8 layers
with components in each that call components in the layer below and
(potentially) run multiple instances of the called components in
separate (pooled) threads. I feel this structure calls for some
vertical "coordination". Or am I overcomplicating things?
And how do I deal with the other aspect your reply made me aware of:
downstream layers should be sometimes optimized for response time
(when called from a certain context of a GUI application) and
sometimes for throughput (when called in a batch job)?
On 2/16/07, David Holmes <dcholmes at optusnet.com.au> wrote:
> > > - You might not be the only application trying to run on the system.
> > I thought of that...but most modern operating systems will do a good
> > job scheduling the running applications fairly. (Even with a single
> > threaded application, I will not include sleeps in order to yield CPU
> > time to other apps in the system.)
> Yes but the response time will suffer. Two systems "tuned" for 100%
> utilization won't meet the throughput/response times they were tuned for if
> they only get 50% of the expected CPU.
> > > - You might want to allow room for handling transient overload.
> > I am not sure I understand this one. Please, could you elaborate?
> > (Assume my target is full CPU utilization. Assume I also allow for a
> > copious wait time ratio. I will create a superfluous number of
> > threads... Hmm... I'd instinctively think that just the opposite is
> > true: the less the CPU utilization target, the less I am able to
> > handle transient overload. Isn't it?)
> I'm thinking about this from a queuing theory perspective. You have an
> expected load on your application due to the expected number of tasks and
> the work those tasks perform. Assuming you can quantify both then you can
> work out expected response times, average queue length etc for a given
> number of worker threads and CPUs. If more work comes in than expected - a
> transient overload - then the queue length will grow and the response time
> will increase. If this queue growth triggers growth of the pool (ie blocking
> queue becomes full so pool creates threads from core size up to maximum
> size) then without spare CPU cycles you won't help with response times (just
> queue length). But if you've allowed for some CPU cycles then they are
> available to assist with dealing with the transient overload.
> Or to put it more simply. Your normal processing mode allows for idle time
> on some CPU's and meets throughput goal X. Then under overload you can
> utilize the idle time to try and meet a revised throughput goal Y < X.
> Without the spare capacity under overload your throughput might drop to Z <<
> If you are providing pools as part of a library you need to provide the
> necessary tuning knobs as part of the API.
More information about the Concurrency-interest