Americas

  • United States
sandra_henrystocker
Unix Dweeb

The many faces of awk

How-To
Aug 02, 20219 mins
Linux

The awk command provides a lot more than simply selecting fields from input strings, including pulling out columns of data, printing simple text, evaluating content – even doing math.

Colorful layered multi-cultural diverse faces
Credit: Thinkstock

If you only use awk when you need to select specific fields from lines of text, you might be missing out on a lot of other services that the command can provide. In this post, we’ll look at this simple use along with many other things that awk can do for you with enough examples to show you that the command is a lot more flexible than you might have imagined.

Plucking out columns of data

The easiest and most commonly used service that awk provides is selecting specific fields from files or from data that is piped to it. With the default of using white space as a field separator, this is very simple:

$ echo one two three four five | awk ‘{print $4}’

four
$ who | awk '{print $1}'
jdoe
fhenry

White space is any sequence of blanks and tabs. In the commands shown above, awk is extracting just the fourth and first fields from the data provided.

[Get regularly scheduled insights by signing up for Network World newsletters.]

Awk can also pull text from files by just adding the name of the file after the awk command.

$ awk '{print $1,$5,$NF}' HelenKellerQuote
The beautiful heart.

In this case, awk has picked out the first, fifth, and last words in the single line of text.

The $NF specification in the command picks the last piece of text on each line. That is because NF represents the number of fields in a line (23) while $NF represents the value of that last field (“heart.”). The period is included in the output because it’s part of the final string.

Fields can be printed in any order that you might find useful. In this example, we are rearranging the fields in date command output.

$ date | awk '{print $4,$3,$2}'
2019 Nov 22

If you omit the commas between the field designators in an awk command, the output will be pushed into a single string.

$ date | awk '{print $4 $3 $2}'
2021Aug2

If you replace the usual commas with hyphens, awk will attempt to subtract one field from another–probably not what you intended. It doesn’t take the hyphens as characters to be inserted into the print output. Instead, it puts some of its mathematical prowess into play.

$ date | awk '{print $4-$3-$2}'
2019

In this case, it’s subtracting 2 (the day of the month) from the year (2021) and simply ignoring “Aug”.

If you want your output to be separated by something other than white space, you can specify your output separator with OFS (output field separator) like this:

$ date | awk '{OFS="-"; print $4,$3,$2}'
2021-Aug-02

If your date command output looks like what you see below, you need to change the fields to $7,$2,$3 in the commands shown above since the year is listed last.

$ date
Mon Aug 2 02:26:34 PM EDT 2021

Printing simple text

You can also use awk to simply display some text. Of course, if all you want to do is print a line of text, you’d be better off using an echo command. On the other hand, as part of an awk script, printing some relevant text can be very useful. Here’s a practically useless example:

$ awk 'BEGIN {print "Hello, World" }'
Hello, World

Here’s a more sensible example in which adding a line of text to label your data can help identify what you’re looking at:

$ who | awk 'BEGIN {print "Current logins:"} {print $1}'
Current logins:
shs
nemo

Specifying a field separator

Not all input is going to be separated by white space. If your text is separated by some other character (e.g., commas, colons or semicolons), you can inform awk by using the -F (input separator) option as shown here:

$ cat testfile
a:b:c,d:e
$ awk -F : '{print $2,$3}' testfile
b c,d

Here’s a more useful example – pulling a field from the colon-separated /etc/passwd file:

$ awk -F: '{print $1}' /etc/passwd | head -11
root
bin
daemon
adm
lp
sync
shutdown
halt
mail
operator
games

Evaluating content

You can also evaluate fields using awk. If you, for example, want to list only user accounts in /etc/passwd, you can include a test for the 3rd field. Here we’re only going after UIDs that are 1000 and above:

$ awk -F":" ' $3 >= 1000 ' /etc/passwd
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
shs:x:1000:1000:Sandra Henry-Stocker,,,:/home/shs:/bin/bash
nemo:x:1001:1001:Nemo,,,:/home/nemo:/usr/bin/zsh
dory:x:1002:1002:Dory,,,:/home/dory:/bin/bash
...

If you want to add a title for your listing, you can add a BEGIN clause:

$ awk -F":" 'BEGIN {print "user accounts:"} $3 >= 1000 ' /etc/passwd
user accounts:
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
shs:x:1000:1000:Sandra Henry-Stocker,,,:/home/shs:/bin/bash
nemo:x:1001:1001:Nemo,,,:/home/nemo:/usr/bin/zsh
dory:x:1002:1002:Dory,,,:/home/dory:/bin/bash

If you want more than one line in your title, you can separate your intended output lines with “n” (newline characters).

$ awk -F":" 'BEGIN {print "user accountsn============="} $3 >= 1000 ' /
etc/passwd
user accounts
=============
nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
shs:x:1000:1000:Sandra Henry-Stocker:/home/shs:/bin/bash
nemo:x:1001:1001::/home/nemo:/bin/bash
bugfarm:x:1006:1006::/home/bugfarm:/bin/bash
dbell:x:1002:1002::/home/dbell:/bin/bash
dorothy:x:1008:1008::/home/dorothy:/bin/bash
eel:x:1005:1005::/home/eel:/bin/bash
gijoe:x:1017:1017::/home/gijoe:/bin/bash
gino:x:1016:1016::/home/gino:/bin/bash
lola:x:1007:1007::/home/lola:/bin/bash
myself:x:1013:1013::/home/myself:/bin/bash
newuser:x:1014:1014::/home/newuser:/bin/bash
shark:x:1003:1003::/home/shark:/bin/bash
tadpole:x:1004:1004::/home/tadpole:/bin/bash
jadep:x:1012:1012::/home/jadep:/bin/bash
snakey:x:1018:1018::/home/snakey:/bin/bash
monsterfromthedeepbluesea:x:1019:1019::/home/monsterfromthedeepbluesea:/bin/bash

Doing math with awk

awk provides a surprising mathematical ability and can calculate square roots, logs, tangents, etc.

Here are a couple examples:

$ awk 'BEGIN {print sqrt(2021)}'
44.9555
$ awk 'BEGIN {print log(2019)}'
7.61036

For more details on awk‘s mathematical skills, check out Doing math with awk.

Writing awk scripts

You can also write standalone scripts with awk. Here’s an example that mimics one of the examples provided earlier, but also counts the number of users with accounts on the system.

#!/usr/bin/awk -f
# This line is a comment

BEGIN {
    printf "%sn","User accounts:"
    print "=============="
    FS=":"
    n=0
}

# Now we'll run through the data
{
    if ($3 >= 1000) {
        print $1
        n ++
    }
}
 
END {
    print "=============="
    print n " accounts"
}

Notice how the BEGIN section, which is run only when the script starts, provides a heading, dictates the field separator, and sets up a counter to start with 0. The script also includes an END section which only runs after all the lines in the text provided to the script have been processed. It displays the final count of lines that meet the specification in the middle section (third field is 1,000 or larger).

Use the script like this:

$ ./list_users /etc/passwd
User accounts:
==============
nobody
shs
nemo
bugfarm
dbell
dorothy
eel
gijoe
gino
lola
myself
newuser
shark
tadpole
jadep
snakey
monsterfromthedeepbluesea
==============
17 accounts

Searching for text in a file

You can use awk to select lines from a file that contain a specified word or string. The examples below illustrate both with an allowance for a lowercase or uppercase “B” in the third.

$ awk '/harmony/ {print}' Happy_Quotes
"Happiness is when what you think, what you say, and what you do are in harmony." —Mahatma Gandhi
$ awk '/present moment/ {print}' Happy_Quotes
"The present moment is filled with joy and happiness. If you are attentive, you will see it." —Thich Nhat Hanh
$ awk '/[Bb]e happy/ {print}' Happy_Quotes
"Be happy for this moment. This moment is your life." — Omar Khayyam

Selecting a portion of a file

To select a section of text from a file, you need to include text from the start and end lines that identify the portion to be extracted. Here’s an example:

$ awk /LAST/,/END/ Happy_Quotes
LAST QUOTE
"Action may not always bring happiness, but there is no happiness without action." —William James
THE END

Replacing text in a file

To change text that’s in a file using awk, you can use syntax like that shown below. Just keep in mind that this changes the text that you see, but does not modify the file contents. To save the changes, redirect the output to a temporary file and then use it to replace the original.

$ awk '{gsub(/joy/,"fear"); print}' test11
"The present moment is filled with fear and happiness. If you are attentive, you will see it." —Thich Nhat Hanh

To replace multiple strings, use more than one gsub command:

$ awk '{gsub(/joy/,"fear"); gsub(/happiness/,"misery"); print}' test11
"The present moment is filled with fear and misery. If you are attentive, you will see it." —Thich Nhat Hanh

Counting lines or words in a file

To print the number of lines in a file using awk, do this:

$ awk 'END {print NR}' Happy_Quotes
12

The inclusion of END in the command means output is provided after lines have been processed. NR (number of records) represents the number of lines in the file.

To print the number of words (or strings) on each line of a file, you can use a command like this:

$ awk '{print NF}' Happy_Quotes
18
15
12
16
11
12
17
18
20
2
15
2

You can use a script like that shown below to count the words and provide just the total.

$ cat count_words
#!/usr/bin/awk -f

BEGIN { num=0 }
{
    print NF
    num=num + NF
}
END {print num}

This script runs through the target file a line at a time and adds the word count for each line to the total. This works because NF represents the number of fields in each line.

Alternately, you can use awk commands like these to get the overall and the per-line plus overall counts:

$ awk 'BEGIN {num=0}; {num=num+NF}; END {print num}' Happy_Quotes
158
$ awk 'BEGIN {num=0}; {print NF;num=num+NF}; END {print num}' Happy_Quotes
18
15
12
16
11
12
17
18
20
2
15
2
158
$ count_words Happy_Quotes
158

Determining the most heavily used commands

You can also use awk along with a number of other commands to view which commands you have used most frequently within the life span of your current history file.

$ history | awk '{print $2}' | sort | uniq -c | sort -nr | head -5
    191 sudo
     91 ls
     80 vi
     57 systemctl
     51 man

If you have added date and time fields to your history file, use $4 instead of $2 in the command above.

Wrap-Up

A long-standing Unix command, awk still provides very useful services and remains one of the reasons that I fell in love with Unix many decades ago. While some of awk‘s functions (like counting lines) can be more easily performed by other commands (like wc and wc -l), it’s still useful to know what awk can do, especially if you get into writing awk scripts and need to use many of its functions.

sandra_henrystocker
Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.