Americas

  • United States
sandra_henrystocker
Unix Dweeb

Commands for finding out if compressed Linux files are the same

How-To
Nov 30, 20223 mins
Linux

The zdiff and zcmp commands can see if Linux files differ and if so, how.

robots toy robot ai machine learning automation pair team teamwork
Credit: Getty Images

Compressed Linux files are helpful because they save disk space, but what should you do when you have a series of compressed files and want to determine if any are duplicates? The zdiff and zcmp commands can help.

To begin, if a directory contains two files like those below, it’s easy to tell just from the listing that they are not identical. After all, the file sizes are a little different. The files look like this:

$ ls -l
total 200
-rw-r--r--. 1 shs shs 102178 Nov 22  2021 2021.gz
-rw-r--r--. 1 shs shs 102181 Nov 22 11:19 2022.gz

If you compare the files with the diff command, it will confirm that the files differ:

$ diff 2021.gz 2022.gz
Binary files 2021.gz and 2022.gz differ

What the diff command doesn’t tell you (because it examines the files byte by byte) is that the material that was compressed in creating these two files actually is identical. To determine that, you would need to use the zdiff or the zcmp command. If the file content that was compressed in each file is identical, you will get no output from the command from either of these commands.

$ zdiff 2021.gz 2022.gz
$
$ zcmp 2021.gz 2022.gz
$

After using gunzip to decompress the files, the resulting files are the same size and can be compared with the diff command to confirm their identical content. Again, the absence of output from the diff command indicates that there are no differences.

$ gunzip 2021.gz
$ gunzip 2022.gz
$ ls -l
total 852
-rw-r--r--. 1 shs shs 383654 Nov 22  2021 2021
-rw-r--r--. 1 shs shs 383654 Nov 22 11:19 2022
$ diff 2021 2022
$

Clearly, the file content is the same. Why, then, do the compressed versions appear to be different? That’s because gzip retains the original file name and includes the file’s timestamp when it compresses a file. This information is not included in the comparisons.

Comparing compressed and non-compressed files

While both the zdiff and zcmp commands can determine whether two compressed files are the same, they can also compare the content of a compressed file with a non-compressed file. In other words, if you compare a compressed file with the file that contains the original content but is not compressed, you will still get confirmation that the content matches.

$ zdiff 2021.gz 2022
$
$ zcmp 2021.gz 2022
$

In fact, although there’s no benefit to using zdiff and zcmp with non-compressed files, the commands would still comply with your request. The command below compares the two files when both are decompressed.

$ zdiff 2021 2022
$

zdiff and zcmp differences

The main difference between the zdiff and zcmp commands is what they tell you when files are different. If you use the zdiff command, it will display any differences detected in the compressed content.

$ zdiff 2022.gz 2023.gz
6409c6409
        There may be only one active coprocess at a time!

If you use the zcmp command, it will tell you that the file content is different and where any differences are located by byte and line number.

$ zcmp 2022.gz 2023.gz
/dev/fd/5 - differ: byte 383573, line 6409

Wrap-Up

The zdiff and zcmp commands allow you to compare the content of files compressed with gzip. While both commands will show no output if the file content matches, they will show different details when the files are different. You can also use these commands to compare files compressed with gzip to files that are not compressed in order to determine if the original content is the same in both.

sandra_henrystocker
Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.