finding duplicate files in a folder

Ruby script to find duplicate files in a folder. Example of md5 digest:

require "digest/md5"

abort "need folder" if ARGV.empty?
folder = ARGV.first.gsub(/\\/,"/")

files = Dir["#{folder}/**/*"].select {|f| File.file?(f)}
file_hashes = files.inject(Hash.new {|h,k| h[k] = []}) do |h,element|
	puts "analyzing #{element}"
	signature = Digest::MD5.hexdigest(File.read(element))
	h[signature] << element

file_hashes.keys.select {|key| file_hashes[key].size > 1}.each {|k| puts file_hashes[k].inspect}


1 Response to “finding duplicate files in a folder”

  1. 1 manveru
    November 2, 2010 at 09:18

    Just a little “improvement”, less iteration, nicer error message, etc. http://gist.github.com/659309

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog Stats

  • 229,426 hits

%d bloggers like this: