Americas

  • United States
sandra_henrystocker
Unix Dweeb

Using the comm command to compare files or directories on Linux

How-To
Sep 11, 20234 mins
Linux

The Linux comm command makes it easy to compare a couple text files and determine if they both contain the same lines -- whether the file contents are sorted or not.

filing cabinet files records stokkete shutterstock
Credit: Stokkete / Shutterstock

The comm command on Linux systems can compare file or directory contents and display the differences in a clear and useful way. Think of “comm” not so much as a reference to “compare” as to “common,” since the command writes to standard output both the lines that are common and the lines that are unique in each of the files or directories.

One key requirement when using comm is that the content to be compared must be in sorted order. However, there are ways that you can get away with comparing content that isn’t sorted. Some examples of how to do this will be presented in this post.

Comparing files

Normally, when using the comm command, you would compare two sorted text files to see their shared and unique lines. Here’s an example in which a list of friends and a list of neighbors are compared.

$ comm friends neighbors
Alice
Betty
Christopher
                Diane
George
                Patty
Ricky
Sam
                Tim
        Zelda

Notice that the output is displayed in three columns. The first includes the names that are only included in the first file. The second shows the names that are only included in the second file. The third shows the common names.

NOTE: If one of the files were not sorted, you would see something like this:

$ comm friends neighbors
Alice
Betty
Christopher
                Diane
                Patty
comm: file 1 is not in sorted order        

You could, however, get around this issue without actually changing the sort order of the files themselves. Instead, you could sort the files when running the comm command as in this example:

$ comm 

If you want to see only the contents that are common to the files being compared, you can suppress the display of the first two columns with a command like this one:

$ comm -12 friends neighbors
Diane
Patty
Tim

The "-12" means "suppress column 1 and columns 2". Any of the columns can be suppressed in this way. In the command below, only the third column is suppressed. As a result, you see the names that are unique to each file, but not those included in both files.

$ comm -3 friends neighbors
Alice
Betty
Christopher
George
Ricky
Sam
        Zelda

If you want to compare files that may not be sorted, you can use the --nocheck-order option to suppress the comm command’s complaints:

$ comm --nocheck-order friends neighbors
Alice
Betty
Christopher
                Diane
George
                Patty
Ricky
Sam
Tim
        Zelda
        Tim

To have the comm command count the number of lines in each column, add the --total option as shown below.

$ comm --total friends neighbors
Alice
Betty
Christopher
                Diane
George
                Patty
Ricky
Sam
                Tim
        Zelda
6       1       3       total

To use a different delimited than the tabs that, by default, separate the columns, use the --output-delimiter option as shown in the example below. The lines below with no “:” characters are first column (only in the first file) entries. Those starting with a single “:” are second-column (only in the second file) names. The lines below that start with “::” are third-column (contained in both files) names. This can make it easier to import the output into a spreadsheet.

$ comm --output-delimiter=: friends neighbors
Alice
Betty
Christopher
::Diane
George
::Patty
Ricky
Sam
::Tim
:Zelda

Comparing directories

When comparing directory content, you need to use a technique similar to what was shown earlier for comparing unsorted files by sorting their contents in the process of comparing them to list the files. In this example, the contents of the two directories are listed before being compared.

$ comm 

In the example above, the only file that is common to both directories is file3.

Note, however, that comm command just shown is only comparing the file names. It is not comparing file contents.

Adding headings

If you want to add column headings to your comm output, you can put both an echo command and the comm command that you want to run into a script file. Though the headings won’t necessarily align precisely with the content being compared, they can still be useful. Here's an example script:

#!/bin/bash

echo -e "friendstneighbors both"
echo "======= ========= ===="
comm friends neighbors

Here’s example output from the script:

$ compare
friends neighbors both
======= ========= ====
Alice
Betty
Christopher
                Diane
George
                Patty
Ricky
Sam
                Tim
        Zelda

Wrap-up

The comm command makes it quite easy to compare the contents of text files – even when they’re not sorted to begin with. It also allows you to compare the names of files in two directories. Check out the comm man page for additional options.

sandra_henrystocker
Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.