[concurrency-interest] Recursive Directory checker

Benedict Elliott Smith lists at laerad.com
Fri Feb 24 13:29:42 EST 2012


I hate to nitpick, but this is only true for sequential reads; as soon as
you devolve to random IO (and for large directory trees metadata traversal
is unlikely at best to remain sequential, even if there are no other
competing IO requests) you are much better with multiple ops in flight so
the disk can select the order it services them and to some degree maximize
throughput. When performance testing new file servers I have found single
threaded random IOPs are typically dreadful, even with dozens of disks.

In my experience a multi-threaded directory traversal has usually been
considerably faster than single threaded.

I don't think the choice of queue is likely to have a material impact on
the performance of this algorithm, Aleksandar; IO will be your bottleneck.
However, I think the use of a queue defeats the point of using the ForkJoin
framework.


On 24 February 2012 17:52, Nathan Reynolds <nathan.reynolds at oracle.com>wrote:

>  I would like to point out that hard disks perform best when accessed in a
> single threaded manner.  If you have 2 threads making requests, then the
> disk head will have to swing back and forth between the 2 locations.  With
> only 1 thread, the disk head doesn't have to travel as much.  Flash disks
> (SSDs) are a different story.  We have seen optimal throughput when 16
> threads hit the disk concurrently.  Your mileage will vary depending upon
> the SSD.  So, you may not get much better performance from your directory
> size counter by using multiple threads.
>
> I have found on Windows that defragmenting the hard drive and placing all
> of the directory meta data together makes this kind of thing run really
> fast.  (See MyDefrag). The disk head simply has to sit on the directory
> meta data section of the hard disk.  I realize you aren't running on
> Windows.  But, you might consider something similar.
>
>  Nathan Reynolds<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds>| Consulting Member of Technical Staff |
> 602.333.9091
> Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
>
> On 2/24/2012 8:59 AM, Aleksandar Lazic wrote:
>
> Dear list members,
>
> I'm on the way to write a directory counter.
>
> I'm new to all this thread/fork stuff, so please accept my apologize
> for such a 'simple' question ;-)
>
> What is the 'best' Class for such a program.
>
> ForkJoinTask
> RecursiveAction
> RecursiveTask
>
> I plan to use for the main program.
>
> pseudocode
> ###
> main:
>
>  File startdir = new File("/home/user/");
>  File[] files = file.listFiles()
>
>  add directories to the Queue.
>
> -----
> I'm unsure which Queue is the best for this?
>
> http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/Queue.html
>
> I tend to BlockingDeque
> -----
>
>   ForkJoinPool fjp = new ForkJoinPool(5);
>
>   foreach worker
>     get filesizes and $SummAtomicLong.addAndGet(filesizes);
>
> print "the Directory and there subdirs have {} Mbytes", $SummAtomicLong
>
> ####
>
> Worker:
>
>   foreach directory
>     if directory is not in queue
>       add directory to the Queue.
>
>   foreach file
>     add filesize to $workerAtomicLong.addAndGet(file.size);
> ###
>
> I hope it is a little bit clear what I want to do ;-)
>
> No this is not a Homework ;-)
>
> Should I use a global variable for the SummAtomicLong?
> Should I use a global variable for the DirectoryQueue?
>
> I expect that there are not more then 'ForkJoinPool(5)'-Threads/Processes
> which work
> on the disk, is that right?
>
> I have try to understand some of the
>
> http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/loops/
>
> but I have still some questions.
>
> Many thanks for all your help.
>
> Cheers
> Aleks
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120224/9142e269/attachment-0001.html>


More information about the Concurrency-interest mailing list