[concurrency-interest] Design feedback on application using PooledExecutor to run threads that download webpages

Michael Mattox michael.mattox@verideon.com
Fri, 20 Jun 2003 18:43:18 +0200


I'm hoping to get some feedback on my design to make sure I'm doing things
in the most optimal way.  I'm having some performance problems and I'm not
sure if I've just reached the limit of my server (I doubt it) or if my code
is not as efficient as it could be.  My application is simple, basically it
just connects to websites and downloads the contents, and stores the status
in the database (HTTP Status code for example).  I'm using the
PooledExecutor class for my thread pool, and the LinkedQueue as my queue.
For my stress testing I have 900 websites that are monitored, 500 of them
every 10 minutes and 400 every 5 minutes.  The code that downloads the
website (using Apache's HttpClient) blocks on the part that downloads the
code.  For this reason I'd like to have more threads running than
processors, that that when a thread blocks while waiting on the remote
server another thread can be run.  JProfiler reports that approximately 65%
of the CPU time is spent in the method that downloads the page.

I've deployed this to a 4 CPU Linux server.  I've been playing around with
the maximum number of threads to use.  So far I've been using 32.  That
means each CPU should be downloading 8 webpages at the same time, which is
reasonable for a server with plenty of bandwidth (burstable up to 100 megs
they tell me).  I've tried putting the threads up to 60 but the time it
takes to execute each one goes up unsuprisingly.  Is this just a matter of
trial and error to determine the optimum settings?  I'm having performance
problems where the websites are not monitored at the 5 minutes that I expect
(between 5-10 most of the time) and I'm trying to figure out how I can
optimize this even further.  When I run 'top' on the server it shows the CPU
utilization at 7% on average and I can't figure out why the CPUs aren't
being used 100%.  I guess it means they can't because the threads are
blocking, in which case, assuming I'm not maxing out the bandwidth, I'd
expect increasing the threads to 60 to use more of the CPU but it doesn't.

I hope all this makes sense, I'd be grateful for any input on this.

Thanks,
Michael