Americas

  • United States
sandra_henrystocker
Unix Dweeb

Smart ways to compare files on Linux

How-To
Feb 16, 20215 mins
Linux

Many new tools for comparing files have emerged in Linux over the years, and in this post, we'll examine seven useful tools for doing that.

one yellow arrow moving opposite a stream of white arrows
Credit: Thinkstock

Commands for comparing files have proliferated since the early days of Linux. In this post, we’ll look at a suite of commands available for comparing files and highlight the advantages that some of the newer ones provide.

diff

One of the oldest and still popular commands for detecting and reporting on file differences is the diff command. Comparing two lists of meeting attendees, the diff command will simply and clearly show you the differences.

$ diff attendance-2020 attendance-2021
10,12c10
 Sandra Henry-Stocker

Only the lines that are different are displayed. The output precedes lines that are only in the first file with and those only in the second file with >.

This output does not show the names of individuals who attended both meetings, but only those that only attended the 2020 meeting and those that only attended the 2021 meeting. If you only want to know whether the files are different, you can add the -q argument.

$ diff -q attendance-2020 attendance-2021
Files attendance-2020 and attendance-2021 differ

The diff command will not tell you anything if two files are the same. If you want confirmation that files are identical, you can add a -s argument.

$ diff attendance-2020 attendance-2021
$ diff -s attendance-2020 attendance-2021
Files attendance-2020 and attendance-2021 are identical

The diff command can also compare binary files (e.g., executables and images), but will only tell you if they are the same or different.

$ diff -s penguin.png penguin0.png
Files penguin.png and penguin0.png are identical

If you want to see a side-by-side comparison of two text files, you can use the -y argument and see output like this:

$ diff -y attendance-2020 attendance-2021
Alfreda Branch                                      Alfreda Branch
Hans Burris                                         Hans Burris
Felix Burt                                          Felix Burt
Ray Campos                                          Ray Campos
Juliet Chan                                         Juliet Chan
Denver Cunningham                                   Denver Cunningham
Tristan Day                                         Tristan Day
Kent Farmer                                         Kent Farmer
Terrie Harrington                                   Terrie Harrington
Monroe Landry                                     | Sandra Henry-Stocker
Jonathon Moody                                    

colordiff

The colordiff command enhances the differences between two text files by using colors to highlight the differences.

$ colordiff attendance-2020 attendance-2021
10,12c10



---

If you add a -u option, those lines that are included in both files will appear in your normal font color.

wdiff

The wdiff command uses a different strategy. It highlights the lines that are only in the first or second files using special characters. Those surrounded by square brackets are only in the first file. Those surrounded by braces are only in the second file.

$ wdiff attendance-2020 attendance-2021
Alfreda Branch
Hans Burris
Felix Burt
Ray Campos
Juliet Chan
Denver Cunningham
Tristan Day
Kent Farmer
Terrie Harrington
[-Monroe Landry			-]			{+Sandra Henry-Stocker+}	

vimdiff

The vimdiff command takes an entirely different approach. It uses the vim editor to open the files in a side-by-side fashion. It then highlights the lines that are different using background colors and allows you to edit the two files and save each of them separately.

Unlike the commands described above, it runs on the desktop, not in a terminal window.

On Debian systems, you can install vimdiff with this command:

$ sudo apt install vim

vimdiff.jpg

kompare

The kompare command, like vimdifff, runs on your desktop. It displays differences between files to be viewed and merged and is often used by programmers to see and manage differences in their code. It can compare files or folders. It's also quite customizable.

Learn more at kde.org.

kdiff3

The kdiff3 tool allows you to compare up to three files and not only see the differences highlighted, but merge the files as you see fit. This tool is often used to manage changes and updates in program code.

Like vimdiff and kompare, kdiff3 runs on the desktop.

You can find more information on kdiff3 at sourceforge.

Using checksums

One easy way to find out if files are the same or different is to compute checksums. If the results are the same, the likelihood that the files are different is infinitesimally small.

One of the primary advantages of using checksums is that the files don't even need to be on the same system. Use the same checksum command and compare the results. The disadvantage is that checksums won't tell you how the files are different or even how much they are different. If a single byte is different, the checksums will be dramatically different. That's the way they work. These two files have only one letter that is not the same, yet the checksums are dramatically different:

$ shasum words-1 words-2
36e191c4a932d239233ca8cced35f7689d070c0c  words-1
c09bb9b4b5f61a72a7ca6e933981e151cd35c9a7  words-2

Keep in mind that there are many commands for calculating checksums. A command like this should help you identify those that are installed on your system:

$ apropos checksum
cksum (1)            - checksum and count the bytes in a file
Dpkg::Checksums (3perl) - generate and manipulate file checksums
shasum (1)           - Print or Check SHA Checksums
sum (1)              - checksum and count the blocks in a file
tc-csum (8)          - checksum update action

Wrap-Up

While there are many choices for comparing files (not all covered in this post), the ones that work best for you will depend on whether you just want to know if files are different or you want to work with the differences.

sandra_henrystocker
Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.