[concurrency-interest] Target CPU utilization

David Holmes dcholmes at optusnet.com.au
Mon Feb 19 18:31:47 EST 2007


Sorry for the delay in responding. The issues you are wrestling with go
beyond my experience in putting any of this into practice - sorry.


> -----Original Message-----
> From: concurrency-interest-bounces at cs.oswego.edu
> [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Peter
> Kovacs
> Sent: Friday, 16 February 2007 8:40 PM
> To: dholmes at ieee.org
> Cc: concurrency-interest
> Subject: Re: [concurrency-interest] Target CPU utilization
> Thank you for your advice, David! Honestly, I haven't thought about
> this problem from the response time perspective. One of the reasons
> may be that my performance requirements are mainly centered around
> throughput. But there will certainly be applications where response
> time will be important.
> This leads to another interesting part of my task. The library itself
> I am working with is layered. More elementary operations (like
> aromatic ring discovery) are used in throughput-minded batch-style
> jobs (like generating fingerprints during database import of compounds
> -- fingerprints include the number of aromatic rings). But the more
> elementary operations are also used in interactive GUI applications
> such is in "aromatizing" a displayed chemical structure -- where
> response time has a clear priority over throughput.
> In addition to the response time vs. throughput dilemma, the question
> that really intrigues me about this layered aspect is the following.
> Batch-style jobs are coarsely grained operations and are relatively
> easy to make parallel. More elementary operations are finely grained
> and typically more difficult to make parallel. (The more elementary
> operations are more difficult to make parallel partly due to
> algorithmic constraints resulting from the nature of the problem they
> solve, partly because of their different scale of performance: their
> execution times are much more closely comparable to those of the
> elementary platform operations, hence they are more sensitive to the
> extra overhead associated with adding multithreaded capability.) The
> obvious approach to getting ready for "Amdahl's era of computing" is
> to start with making parallel the batch category of operations. But
> should I not make my multithreading infrasturcture already prepared
> for 800 systems. And if I should, what should the underlying thread
> management model look like? Should I use for a highly layered library,
> the traditional "flat" model of side-by-side thread-pools with
> (internal) client code picking the thread-pool best fitting for its
> work category? Shouldn't there be a "vertical" coordination between
> layers in terms thread management/multithreading for performance (or
> other?) reasons?
> In fact, in my particular case, some of the "more elementary
> operations" (such as a complex chemical reaction) can quickly scale up
> through a combination of increased "problem size" and increased
> problem complexity into a work unit which already warrants the
> introduction of multi-threaded capability into the more elementary
> operation (which is not so elementary any more) -- even on two- or
> four-way systems.
> I would like to give you a specific example of the layered structure I
> have to deal with, but I'd rather not for fear of breaking some
> confidentiality rule. The point is that I have at least 7 or 8 layers
> with components in each that call components in the layer below and
> (potentially) run multiple instances of the called components in
> separate (pooled) threads. I feel this structure calls for some
> vertical "coordination". Or am I overcomplicating things?
> And how do I deal with the other aspect your reply made me aware of:
> downstream layers should be sometimes optimized for response time
> (when called from a certain context of a GUI application) and
> sometimes for throughput (when called in a batch job)?
> Thanks,
> Peter
> On 2/16/07, David Holmes <dcholmes at optusnet.com.au> wrote:
> > Peter,
> >
> > > > - You might not be the only application trying to run on the system.
> > >
> > > I thought of that...but most modern operating systems will do a good
> > > job scheduling the running applications fairly. (Even with a single
> > > threaded application, I will not include sleeps in order to yield CPU
> > > time to other apps in the system.)
> >
> > Yes but the response time will suffer. Two systems "tuned" for 100%
> > utilization won't meet the throughput/response times they were
> tuned for if
> > they only get 50% of the expected CPU.
> >
> > > > - You might want to allow room for handling transient overload.
> > >
> > > I am not sure I understand this one. Please, could you elaborate?
> > > (Assume my target is full CPU utilization. Assume I also allow for a
> > > copious wait time ratio. I will create a superfluous number of
> > > threads... Hmm... I'd instinctively think that just the opposite is
> > > true: the less the CPU utilization target, the less I am able to
> > > handle transient overload. Isn't it?)
> >
> > I'm thinking about this from a queuing theory perspective. You have an
> > expected load on your application due to the expected number of
> tasks and
> > the work those tasks perform. Assuming you can quantify both
> then you can
> > work out expected response times, average queue length etc for a given
> > number of worker threads and CPUs. If more work comes in than
> expected - a
> > transient overload - then the queue length will grow and the
> response time
> > will increase. If this queue growth triggers growth of the pool
> (ie blocking
> > queue becomes full so pool creates threads from core size up to maximum
> > size) then without spare CPU cycles you won't help with
> response times (just
> > queue length). But if you've allowed for some CPU cycles then they are
> > available to assist with dealing with the transient overload.
> >
> > Or to put it more simply. Your normal processing mode allows
> for idle time
> > on some CPU's and meets throughput goal X. Then under overload you can
> > utilize the idle time to try and meet a revised throughput goal Y < X.
> > Without the spare capacity under overload your throughput might
> drop to Z <<
> > X.
> >
> > If you are providing pools as part of a library you need to provide the
> > necessary tuning knobs as part of the API.
> >
> > Cheers,
> > David
> >
> >
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at altair.cs.oswego.edu
> http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest

More information about the Concurrency-interest mailing list