Credit: Willis Lai / IDG There are many ways to remove duplicate lines from a text file on Linux, but here are two that involve the awk and uniq commands and that offer slightly different results. Remove duplicate lines with awk The first command we’ll examine in this post is a very unusual awk command that systematically removes every line in the file that is encountered more than once. It leaves the first instance of the line intact, but “remembers” it and removes any duplicates encountered afterwards. Here’s an example. Initially, the file looks like this: Once upon a time, there was a lovely princess with a foul temper. Whenever she went for a walk, she left her castle smiling, but if she ran into anyone frowning or arguing with someone else, she stopped and made an angry face. Continue reading If the princess ran into a friend who didn't want to chat with her, she stopped and made an angry face. Continue reading The awk command that does this work looks like this: $ awk '!x[$0]++' grouchy_princess Once upon a time, there was a lovely princess with a foul temper. Whenever she went for a walk, she left her castle smiling, but if she ran into anyone frowning or arguing with someone else, she stopped and made an angry face. Continue reading If the princess ran into a friend who didn't want to chat with her, Note that each of the duplicated lines is now displayed only once and in its initial position. In fact, if you simply want to see any duplicated lines, you only need to change the command in a minor way. Just remove the exclamation point (signifying “not”) and you will see only the duplicated lines: $ awk 'x[$0]++' grouchy_princess she stopped and made an angry face. Continue reading The only problem with the awk ‘!x[$0]++’ command is that it’s not all that easy to remember. On the other hand, it’s also not that hard to turn the command into a simple script. Mine looks like this: $ cat rmdups #!/bin/bash awk '!x[$0]++' $1 The awk command removes duplicate lines from whatever file is provided as an argument. If you want to save the output to a file instead of displaying it, make it look like this: #!/bin/bash awk '!x[$0]++' $1 > $1-new You can run the script shown using a command like “rmdups addresses”. If you use the second version, a file with “-new” added to the original file name will contain the output. Remove duplicate lines with uniq If you don’t need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one. $ sort grouchy_princess | uniq but if she ran into anyone frowning or arguing with someone else, Continue reading If the princess ran into a friend who didn't want to chat with her, Once upon a time, there was a lovely princess with a foul temper. she stopped and made an angry face. Whenever she went for a walk, she left her castle smiling, In addition, if sorting the contents of your file contents is helpful, this approach may be ideal. While this technique doesn’t work all that well with fairy tales, it works just fine for lists of meeting attendees, grocery shopping lists etc. This combined use of sort and uniq surrounding the file name means a command like it can’t be turned into an alias, but it could be turned into a simple script like this: #!/bin/bash if [ $# == 1 ]; then if [ -f $1 ]; then sort $1 | uniq fi fi The script verifies that an argument was provided and that it’s an existing file before it sorts it and sends the output to the uniq command. Wrap-Up Commands like those shown can be very helpful in cleaning up or verifying the content of text files, particularly lists in which you don’t want any line to show up multiple times. Turning the commands into a script makes it convenient to call on them whenever they might be helpful. Related content how-to Compressing files using the zip command on Linux The zip command lets you compress files to preserve them or back them up, and you can require a password to extract the contents of a zip file. By Sandra Henry-Stocker May 13, 2024 4 mins Linux opinion NSA, FBI warn of email spoofing threat Email spoofing is acknowledged by experts as a very credible threat. By Sandra Henry-Stocker May 13, 2024 3 mins Linux how-to The logic of && and || on Linux These AND and OR equivalents can be used in scripts to determine next actions. By Sandra Henry-Stocker May 02, 2024 4 mins Linux how-to Using the apropos command on Linux By Sandra Henry-Stocker Apr 24, 2024 3 mins Linux PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe