It seems there are quite a number of duplicate file finder tools but we will try with a console tool called fdupes. Typical usage of this program.
1. Install the program.
$ sudo apt-get install fdupes
2. Create sample testing files.
$ cd /tmp $ wget http://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg -O a.jpg $ cp a.jpg b.jpeg $ touch c.jpg d.jpeg
3. Show all duplicate files.
$ fdupes -r . ./a.jpg ./b.jpeg ./c.jpg ./d.jpeg
4. Show all duplicate file but omit the first file.
$ fdupes -r -f . ./b.jpeg ./d.jpeg
5. Similar to step 4 but delete the duplicate files and keep one copy.
$ fdupes -r -f . | grep -v '^$' | xargs rm -v removed `./b.jpeg' removed `./d.jpeg'
On a similar note, there is this interesting read on optimized way by Dupseek, an app that find duplicate files. The main strategy is just group these files by size and start comparing them by set and ignore those set with just one file.
Unfortunately, I've a hard time understand the Perl code. The closet and simplest implementation I can found is the tiny find-duplicates Python script.