using java for common scripting tasks:walking directories & processing files

I’m a fan of scripting languages. If you read some of the stuff I wrote, you may already know that ( if not, check the URL of this website ). You can solve many/all tasks using a scripting language, but, sometimes you’d like to benefit from the speed that a typed language can give you. I think the most frequent task one has to do is process files. But, to do that, first you have to find them, scan directories, filter them, store them in some collection, and only then you can start processing them.

One of the downsides is that usually, code that would take several lines in a language like let’s say Ruby, becomes pretty verbose in Java. Here’s a sample:


Find.find(folder) do |file|
   puts file if File.directory?(file)
end

The code listed above will print all the directories ( and subdirectories ) contained in the folder parameter. 3 lines! Imagine how many you’d have to write in Java. You don’t have a built-in directory walker, so you’d need to roll your own methods/class.
Then, you would want your walker to be usable in all the tasks you may encounter, so that you don’t have to rewrite the whole thing when you need to enumerate all the zip archives inside a folder, or mp3’s. Usually, writing methods that accept callbacks is a great practice. In fact, the code written above illustrates that concept exactly. The stuff contained between the do and end is a block, Ruby’s way of implementing callbacks.
In Java, we don’t have first class functions, but we can achieve kinda the same thing by using interfaces. How does the following Java code look like:


File directory = new File("c:/");
FileWalker.walkDirectory(directory, new Callback<File>() {
        public void action(File t) {
             if(t.isDirectory())
                 System.out.println(t);
        }
});

I’d say it’s not that bad! You can live with that, right? What if you’d like to add them to a list and process them later? The following snippet shows you how to do that:


File directory = new File("c:/");
final ArrayList<File> results = new ArrayList<File>();
FileWalker.walkDirectory(directory, new Callback<File>() {
     public void action(File t) {
        if(t.isDirectory())
           results.add(t);
     }
});
for(File result : results)  {
    System.out.println(result);
}

I don’t know about you, but most ( if not all ) directory walking and file searching stuff I do, involves getting a list of files first, and processing them later. So that I don’t have to write all that code when I need to do this, I created a selectFiles method which does exactly that: return a list of files matching a certain condition. Here’s an example of it’s usage:


File directory = new File("c:/");
ArrayList>File> results = FileWalker.selectFiles(directory, new Selector>File>(){
    public boolean accept(File t) {
       return t.isDirectory();
    }
});
for(File result : results)
{
    System.out.println(result);
}

It doesn’t save you that many keystrokes, but it saves you from declaring a list, and doing everything inside a callback. Think of that Selector as a Filter. Here’s the code behind all the classes showed here:
The Selector interface:


public interface Selector<T> {
    public boolean accept(T t);
}

The Callback interface:


public interface Callback<T> {
    public void action(T t);
}

And the FileWalker class:


public class FileWalker {

    /**
     * select all files from a folder, matching a condition
     * @param root folder
     * @param selector selector object
     * @return list of files matching criteria
     */
    public static ArrayList<File> selectFiles(final File root,final Selector<File> selector)
    {
        final ArrayList<File> files = new ArrayList<File>();
        FileWalker.walkDirectory(root, new Callback()
        {

            public void action(File t) {
                boolean add = true;
                if(selector != null)
                {
                    add = selector.accept(t);
                }
                if(add) { files.add(t); }
            }

        });
        return files;
    }

    /**
     * walk a directory recursively
     * @param root folder
     * @param callback callback object
     */
    public static void walkDirectory(File root,Callback<File> callback)
    {
        File kids[] = root.listFiles();
        callback.action(root);
        if(kids != null)
        {
            for(File kid : kids)
            {
                walkDirectory(kid,callback);
            }
        }
    }

}

Whenever I get the chance, I do this sort of stuff in Ruby/Groovy/Python. I prefer development speed over execution speed. But, sometimes, you just can’t afford going into a dynamic language, and using plain Java for this sort of tasks looks like something you can cope with. Hang in there! 🙂

13 Responses to “using java for common scripting tasks:walking directories & processing files”

Feed for this Entry Trackback Address

1 Josh
November 1, 2009 at 13:37

public void action(File t) {
if(selector != null)
{
if (selector.accept(t)) {
files.add(t);
}
}
else
{
files.add(t);
}
}

- 2 geo
  November 1, 2009 at 13:48
  
  I actually had null checks, but in order to keep the code displaying nicely on the page, I had to take them out 🙂
  
3 ben
November 1, 2009 at 14:29

You are missing more simplication steps. We use this extensively in our inhouse framework:

Files.select(directory, new DirectorySelector(), new Closure{
public void execute(File f){ f.delete(); }
});
->
Files.select(directory, new RegexSelector(“etc”), new DeleteFile());

List files = Files.select(directory, new RegexSelector(), new CollatingClosure()).list();

Java is pretty painless if you are good. Pretty horrible if you are bad unfortunately 🙂

- 4 geo
  November 1, 2009 at 16:32
  
  Nice, I haven’t thought to combine the selector and the callback in one method. I’m using the code I illustrated here pretty extensively.
  
5 gaerfield
November 2, 2009 at 00:15

Hmm… Nice Code, but with Java 1.7 a little bit outdated (Link).

Therefore forget about the java.io-packages. The java.nio packages comes with similar methods like the ones explained here (i.e. java.nio.file.Files.walkFileTree(…)), but should also provide more speed in processing the files. That’s done using OS-ressources for managing the requested operations (less CPU-utilization, because more work would be done by the DMA-Controller).

6 rs
November 2, 2009 at 07:06

There are some tasks where the VM’s JIT kicking in is a MAJOR help! Good post.

Just out of curiosity – how do you run your Java mini-apps (scripts)? From an IDE ? Ant ? I can see it getting quite cumbersome if I had to use javac && java every time.

- 7 geo
  November 2, 2009 at 11:19
  
  Once they get to a “stable” version, I usually make JAR’s out of them.
  
8 Epo
November 5, 2009 at 00:05

Pretty Cool,

I prefer took a (very) little time to make my little utility and benefit from the java world.

If you are under a gnu/linux you can even compile them natively and add it to your path, with something like :

gcj Main.java -o main –main=Main

for little program, it is enough.

Epo

- 9 geo
  November 5, 2009 at 13:04
  
  I never really took the time to explore gcj. I should look into it.
  
  Thanks for the heads up!
  
10 jg
November 16, 2009 at 10:48

am writing a log analyser pgrm which is supposed to plot a graph on search string against time after parsing log file(s). am currently clueless about how would i parse and store the log file so that it could be processed to create multiple outputs, now am parsing the entire log file all the time and process to do this – any suggestions?

- 11 geo
  November 16, 2009 at 11:31
  
  I think that’s a question worthy of stackoverflow.com