[concurrency-interest] Recursive Directory checker

Aleksandar Lazic al-javaconcurrencyinterest at none.at
Thu Mar 1 08:14:32 EST 2012


 

Dear List member, 

I have now a part solution


http://www.none.at/NasChecker02.zip 

with the follwoing libs.


http://pholser.github.com/jopt-simple/ 

http://logback.qos.ch/


http://www.slf4j.org/ 

http://jackson.codehaus.org/


http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166.jar 

I have still the
question which Data structure (Queue,CopyOnWriteArrayList,...) can I
handle the recursive directory walk. 

Does I need a 'HandlingClass'?


####Code snipped main 

... 

File myFile = new
File(maConfigReal.getStartDir());
 mainLogger.debug(" myFile getStartDir
{}",myFile.listFiles().toString());

// Here I would add all dirs &
files into a queue and iterate over that queue
 File[] myFiles =
myFile.listFiles(); 

... 

for (File mF: myFiles){
 mainLogger.debug("
mF {} {} " ,mF, mF.isDirectory()?"Dir":"File");
 if(mF.isDirectory()){

toGoDirs.add(mF);
 sumHash.put(mF.toString(), new myRT(mF,toGoDirs));

mainPool.execute(sumHash.get(mF.toString()));
 }
 } 

.... 

#### 

###
code myRT 

... 

protected File compute() { 

... 

myFJFiles =
curDir.listFiles(); 

... 

for (File myFJFile: myFJFiles){

if(!clqft.contains(myFJFile))
 if(myFJFile.isDirectory()){

myRTLogger.debug(" <<< ADD mF isDir {}" ,myFJFile); 

// >>>>> How can I
put this DIR into the MAIN Queue 

// clqft.add(myFJFile); 

 }else
if(myFJFile.isFile()){

dirSizeHash.get(tmp).addAndGet(myFJFile.length());
// myRTLogger.debug("
mF isFile {} size {}" ,myFJFile,myFJFile.length());
 }else{

myRTLogger.info(" mF Unknown type {}" ,myFJFile);
 }
 } 

... 

###


Does I nee the 

mainPool.awaitTermination(10, TimeUnit.SECONDS); 

to
be on the save site that all processes are done with there work. 

Many
thanks for your help. 

Best regards 

Aleks 

On 24-02-2012 20:21,
Aleksandar Lazic wrote: 

> Hi, 
> 
> we scan over a NAS Share (NFS
Netapp Filer), due to this fact I don't think that the deep 
> 
> disk
handling is in my hand. 
> 
> I use currently the IO:AIO program
treescan from the IO:AIO perl module 
> 
>
http://cvs.schmorp.de/IO-AIO/bin/treescan?view=markup 
> 
> which use 8
thread to collect the necessary data. 
> 
> The both links below shows
my description from the perl point of view 
> 
>
http://lists.schmorp.de/pipermail/anyevent/2012q1/000227.html 
> 
>
http://lists.schmorp.de/pipermail/anyevent/2012q1/000231.html 
> 
> The
reason why I want to switch to Java is that i need solution which I
'just' 
> 
> need to extract and run not to install a lot of modules for
the dedicated script language. 
> 
> Please can you tell me what do you
suggest to handle the directories which are already scanned? 
> 
> Best
regards 
> 
> Aleks 
> 
> On 24-02-2012 19:29, Benedict Elliott Smith
wrote: 
> 
>> I hate to nitpick, but this is only true for sequential
reads; as soon as you devolve to random IO (and for large directory
trees metadata traversal is unlikely at best to remain sequential, even
if there are no other competing IO requests) you are much better with
multiple ops in flight so the disk can select the order it services them
and to some degree maximize throughput. When performance testing new
file servers I have found single threaded random IOPs are typically
dreadful, even with dozens of disks. 
>> In my experience a
multi-threaded directory traversal has usually been considerably faster
than single threaded. 
>> I don't think the choice of queue is likely to
have a material impact on the performance of this algorithm, Aleksandar;
IO will be your bottleneck. However, I think the use of a queue defeats
the point of using the ForkJoin framework. 
>> On 24 February 2012
17:52, Nathan Reynolds <nathan.reynolds at oracle.com [9]> wrote:
>> 
>>> I
would like to point out that hard disks perform best when accessed in a
single threaded manner. If you have 2 threads making requests, then the
disk head will have to swing back and forth between the 2 locations.
With only 1 thread, the disk head doesn't have to travel as much. Flash
disks (SSDs) are a different story. We have seen optimal throughput when
16 threads hit the disk concurrently. Your mileage will vary depending
upon the SSD. So, you may not get much better performance from your
directory size counter by using multiple threads.
>>> 
>>> I have found
on Windows that defragmenting the hard drive and placing all of the
directory meta data together makes this kind of thing run really fast.
(See MyDefrag). The disk head simply has to sit on the directory meta
data section of the hard disk. I realize you aren't running on Windows.
But, you might consider something similar.
>>> 
>>> Nathan Reynolds [5]
| Consulting Member of Technical Staff | 602.333.9091
>>> Oracle PSR
Engineering [6] | Server Technology 
>>> 
>>> On 2/24/2012 8:59 AM,
Aleksandar Lazic wrote: 
>>> 
>>>> Dear list members, 
>>>> 
>>>> I'm on
the way to write a directory counter. 
>>>> 
>>>> I'm new to all this
thread/fork stuff, so please accept my apologize 
>>>> for such a
'simple' question ;-) 
>>>> 
>>>> What is the 'best' Class for such a
program. 
>>>> 
>>>> ForkJoinTask 
>>>> RecursiveAction 
>>>>
RecursiveTask 
>>>> 
>>>> I plan to use for the main program. 
>>>>

>>>> pseudocode 
>>>> ### 
>>>> main: 
>>>> 
>>>> File startdir = new
File("/home/user/"); 
>>>> File[] files = file.listFiles() 
>>>> 
>>>>
add directories to the Queue. 
>>>> 
>>>> ----- 
>>>> I'm unsure which
Queue is the best for this? 
>>>> 
>>>>
http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/Queue.html [1]

>>>> 
>>>> I tend to BlockingDeque 
>>>> ----- 
>>>> 
>>>> ForkJoinPool
fjp = new ForkJoinPool(5); 
>>>> 
>>>> foreach worker 
>>>> get
filesizes and $SummAtomicLong.addAndGet(filesizes); 
>>>> 
>>>> print
"the Directory and there subdirs have {} Mbytes", $SummAtomicLong 
>>>>

>>>> #### 
>>>> 
>>>> Worker: 
>>>> 
>>>> foreach directory 
>>>> if
directory is not in queue 
>>>> add directory to the Queue. 
>>>> 
>>>>
foreach file 
>>>> add filesize to
$workerAtomicLong.addAndGet(file.size); 
>>>> ### 
>>>> 
>>>> I hope it
is a little bit clear what I want to do ;-) 
>>>> 
>>>> No this is not a
Homework ;-) 
>>>> 
>>>> Should I use a global variable for the
SummAtomicLong? 
>>>> Should I use a global variable for the
DirectoryQueue? 
>>>> 
>>>> I expect that there are not more then
'ForkJoinPool(5)'-Threads/Processes which work 
>>>> on the disk, is
that right? 
>>>> 
>>>> I have try to understand some of the 
>>>> 
>>>>
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/loops/ [2]

>>>> 
>>>> but I have still some questions. 
>>>> 
>>>> Many thanks for
all your help. 
>>>> 
>>>> Cheers 
>>>> Aleks 
>>>>
_______________________________________________ 
>>>>
Concurrency-interest mailing list 
>>>>
Concurrency-interest at cs.oswego.edu [3] 
>>>>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest [4]
>>> 
>>>
_______________________________________________
>>> Concurrency-interest
mailing list
>>> Concurrency-interest at cs.oswego.edu [7]
>>>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest [8]



Links:
------
[1]
http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/Queue.html
[2]
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/loops/
[3]
mailto:Concurrency-interest at cs.oswego.edu
[4]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
[5]
http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds
[6]
http://psr.us.oracle.com/
[7]
mailto:Concurrency-interest at cs.oswego.edu
[8]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
[9]
mailto:nathan.reynolds at oracle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120301/89fb3b12/attachment-0001.html>


More information about the Concurrency-interest mailing list