[concurrency-interest] Recursive Directory checker

Nathan Reynolds nathan.reynolds at oracle.com
Fri Feb 24 12:52:30 EST 2012


I would like to point out that hard disks perform best when accessed in 
a single threaded manner.  If you have 2 threads making requests, then 
the disk head will have to swing back and forth between the 2 
locations.  With only 1 thread, the disk head doesn't have to travel as 
much.  Flash disks (SSDs) are a different story.  We have seen optimal 
throughput when 16 threads hit the disk concurrently.  Your mileage will 
vary depending upon the SSD.  So, you may not get much better 
performance from your directory size counter by using multiple threads.

I have found on Windows that defragmenting the hard drive and placing 
all of the directory meta data together makes this kind of thing run 
really fast.  (See MyDefrag). The disk head simply has to sit on the 
directory meta data section of the hard disk.  I realize you aren't 
running on Windows.  But, you might consider something similar.

Nathan Reynolds 
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> | 
Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology

On 2/24/2012 8:59 AM, Aleksandar Lazic wrote:
> Dear list members,
>
> I'm on the way to write a directory counter.
>
> I'm new to all this thread/fork stuff, so please accept my apologize
> for such a 'simple' question ;-)
>
> What is the 'best' Class for such a program.
>
> ForkJoinTask
> RecursiveAction
> RecursiveTask
>
> I plan to use for the main program.
>
> pseudocode
> ###
> main:
>
>  File startdir = new File("/home/user/");
>  File[] files = file.listFiles()
>
>  add directories to the Queue.
>
> -----
> I'm unsure which Queue is the best for this?
>
> http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/Queue.html
>
> I tend to BlockingDeque
> -----
>
>   ForkJoinPool fjp = new ForkJoinPool(5);
>
>   foreach worker
>     get filesizes and $SummAtomicLong.addAndGet(filesizes);
>
> print "the Directory and there subdirs have {} Mbytes", $SummAtomicLong
>
> ####
>
> Worker:
>
>   foreach directory
>     if directory is not in queue
>       add directory to the Queue.
>
>   foreach file
>     add filesize to $workerAtomicLong.addAndGet(file.size);
> ###
>
> I hope it is a little bit clear what I want to do ;-)
>
> No this is not a Homework ;-)
>
> Should I use a global variable for the SummAtomicLong?
> Should I use a global variable for the DirectoryQueue?
>
> I expect that there are not more then 
> 'ForkJoinPool(5)'-Threads/Processes which work
> on the disk, is that right?
>
> I have try to understand some of the
>
> http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/loops/
>
> but I have still some questions.
>
> Many thanks for all your help.
>
> Cheers
> Aleks
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120224/2d7d9ed8/attachment.html>


More information about the Concurrency-interest mailing list