What's the best way to find and delete duplicate files?
It's a common scenario. You have a drive or directory hierarchy on which youknow you have duplicate files, but there are too many files involved for you to go picking by hand through each directory. Plus, some of the files might be of the same name and length but have significant internal differences.
The best way to deal with this situation is with a third-party utility. The good news is that there are scads of such tools out there. The bad news is that you'll have to narrow down the field to a few that work well and that handle common use cases.
The best all-around tool I've encountered is WinMerge, an open source program originally developed as a programmer's tool for detecting line-by-line changes between files. It can be used for that, which by itself makes it tremendously useful, since it can generate diff or patch files. Finding changes makes it easier to fix vulnerabilities that could be exploited.
WinMerge can also be used to compare entire directory hierarchies and produce detailed reports about what's different. The results can then be merged or reconciled in a variety of ways: You can, for instance, create a .ZIP archive of only the files that are different.
You can also set the program to back up any changes made to files, so the resulting merge operations will be nondestructive. Plus, there's a plugin architecture that allows for special file-type handling -- for instance, a plugin exists that allows Microsoft Word or Excel documents to be compared.
Some file types require different handling. Audio files, for instance, need to be examined differently because of the way the actual music data and the metadata (e.g., the ID3 tags) are stored in the file. A program like dupeGuru Music Edition can be used to sort through audio files and find duplicates based on the contents of the audio, not just the filenames or the file's metadata.
Do you have questions for our experts? Email firstname.lastname@example.org.
This was first published in February 2013