[concurrency-interest] Target CPU utilization
peter.kovacs.1.0rc at gmail.com
Tue Feb 20 11:08:51 EST 2007
Thank you, David! I appreciate your feedback anyway. (I am aware that
the problem domain I've been trying to explore may be purely
imaginary. :-) )
On 2/20/07, David Holmes <dcholmes at optusnet.com.au> wrote:
> Sorry for the delay in responding. The issues you are wrestling with go
> beyond my experience in putting any of this into practice - sorry.
> > -----Original Message-----
> > From: concurrency-interest-bounces at cs.oswego.edu
> > [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Peter
> > Kovacs
> > Sent: Friday, 16 February 2007 8:40 PM
> > To: dholmes at ieee.org
> > Cc: concurrency-interest
> > Subject: Re: [concurrency-interest] Target CPU utilization
> > Thank you for your advice, David! Honestly, I haven't thought about
> > this problem from the response time perspective. One of the reasons
> > may be that my performance requirements are mainly centered around
> > throughput. But there will certainly be applications where response
> > time will be important.
> > This leads to another interesting part of my task. The library itself
> > I am working with is layered. More elementary operations (like
> > aromatic ring discovery) are used in throughput-minded batch-style
> > jobs (like generating fingerprints during database import of compounds
> > -- fingerprints include the number of aromatic rings). But the more
> > elementary operations are also used in interactive GUI applications
> > such is in "aromatizing" a displayed chemical structure -- where
> > response time has a clear priority over throughput.
> > In addition to the response time vs. throughput dilemma, the question
> > that really intrigues me about this layered aspect is the following.
> > Batch-style jobs are coarsely grained operations and are relatively
> > easy to make parallel. More elementary operations are finely grained
> > and typically more difficult to make parallel. (The more elementary
> > operations are more difficult to make parallel partly due to
> > algorithmic constraints resulting from the nature of the problem they
> > solve, partly because of their different scale of performance: their
> > execution times are much more closely comparable to those of the
> > elementary platform operations, hence they are more sensitive to the
> > extra overhead associated with adding multithreaded capability.) The
> > obvious approach to getting ready for "Amdahl's era of computing" is
> > to start with making parallel the batch category of operations. But
> > should I not make my multithreading infrasturcture already prepared
> > for 800 systems. And if I should, what should the underlying thread
> > management model look like? Should I use for a highly layered library,
> > the traditional "flat" model of side-by-side thread-pools with
> > (internal) client code picking the thread-pool best fitting for its
> > work category? Shouldn't there be a "vertical" coordination between
> > layers in terms thread management/multithreading for performance (or
> > other?) reasons?
> > In fact, in my particular case, some of the "more elementary
> > operations" (such as a complex chemical reaction) can quickly scale up
> > through a combination of increased "problem size" and increased
> > problem complexity into a work unit which already warrants the
> > introduction of multi-threaded capability into the more elementary
> > operation (which is not so elementary any more) -- even on two- or
> > four-way systems.
> > I would like to give you a specific example of the layered structure I
> > have to deal with, but I'd rather not for fear of breaking some
> > confidentiality rule. The point is that I have at least 7 or 8 layers
> > with components in each that call components in the layer below and
> > (potentially) run multiple instances of the called components in
> > separate (pooled) threads. I feel this structure calls for some
> > vertical "coordination". Or am I overcomplicating things?
> > And how do I deal with the other aspect your reply made me aware of:
> > downstream layers should be sometimes optimized for response time
> > (when called from a certain context of a GUI application) and
> > sometimes for throughput (when called in a batch job)?
> > Thanks,
> > Peter
> > On 2/16/07, David Holmes <dcholmes at optusnet.com.au> wrote:
> > > Peter,
> > >
> > > > > - You might not be the only application trying to run on the system.
> > > >
> > > > I thought of that...but most modern operating systems will do a good
> > > > job scheduling the running applications fairly. (Even with a single
> > > > threaded application, I will not include sleeps in order to yield CPU
> > > > time to other apps in the system.)
> > >
> > > Yes but the response time will suffer. Two systems "tuned" for 100%
> > > utilization won't meet the throughput/response times they were
> > tuned for if
> > > they only get 50% of the expected CPU.
> > >
> > > > > - You might want to allow room for handling transient overload.
> > > >
> > > > I am not sure I understand this one. Please, could you elaborate?
> > > > (Assume my target is full CPU utilization. Assume I also allow for a
> > > > copious wait time ratio. I will create a superfluous number of
> > > > threads... Hmm... I'd instinctively think that just the opposite is
> > > > true: the less the CPU utilization target, the less I am able to
> > > > handle transient overload. Isn't it?)
> > >
> > > I'm thinking about this from a queuing theory perspective. You have an
> > > expected load on your application due to the expected number of
> > tasks and
> > > the work those tasks perform. Assuming you can quantify both
> > then you can
> > > work out expected response times, average queue length etc for a given
> > > number of worker threads and CPUs. If more work comes in than
> > expected - a
> > > transient overload - then the queue length will grow and the
> > response time
> > > will increase. If this queue growth triggers growth of the pool
> > (ie blocking
> > > queue becomes full so pool creates threads from core size up to maximum
> > > size) then without spare CPU cycles you won't help with
> > response times (just
> > > queue length). But if you've allowed for some CPU cycles then they are
> > > available to assist with dealing with the transient overload.
> > >
> > > Or to put it more simply. Your normal processing mode allows
> > for idle time
> > > on some CPU's and meets throughput goal X. Then under overload you can
> > > utilize the idle time to try and meet a revised throughput goal Y < X.
> > > Without the spare capacity under overload your throughput might
> > drop to Z <<
> > > X.
> > >
> > > If you are providing pools as part of a library you need to provide the
> > > necessary tuning knobs as part of the API.
> > >
> > > Cheers,
> > > David
> > >
> > >
> > _______________________________________________
> > Concurrency-interest mailing list
> > Concurrency-interest at altair.cs.oswego.edu
> > http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
More information about the Concurrency-interest