Unix Dweeb

Counting individual characters on Linux

How-To

Oct 26, 20225 mins

If you need to count how many of each character is included in a file or phrase, there are some handy commands you can string together to accomplish this along with scripts and aliases that can make the job easy.

Determining how many characters are in a file is easy on the Linux command line: use the ls -l command.

On the other hand, if you want to get a count of how many times each character appears in your file, you’re going to need a considerably more complicated command or a script. This post covers several different options.

Counting how many times each character appears in a file

To count how many of each character are included in a file, you need to string together a series of commands that will consider each character and use a sort command before it counts how many of each character are included.

To do that, you can use a command like this one:

$ cat myfile | sed 's/(.)/n1/g' | sort | uniq -c | column
     24              58 c           112 i           132 o             7 T
    254               2 C             3 I             2 O            30 u
      1 '            50 d             4 j            29 p            23 v
     25 ,           163 e             5 k             1 P             9 w
     20 .             2 E            60 l             2 q             4 x
    142 a            21 f            48 m            90 r            36 y
      5 A            16 g             2 M             1 R             3 z
     23 b             1 G           117 n           147 s
      1 B            51 h             1 N           119 t

The sed command will separate the file into a single character chunks. That output is then sorted by the sort command. After that, each group of the same character is counted by the uniq -c command and the column command is used to create the multi-column output. Since the results are based on the file content, no characters are listed besides those in the file.

Notice that the output displays the list of characters in the selected file in alphanumeric order thanks to the sort command. The first two characters aren’t shown because linefeeds and spaces are only recognizable in context.

If you want to display the characters in frequency order instead, all you need to do is add a second sort command using the -g (general numeric).

$ cat myfile | sed 's/(.)/n1/g' | sort | uniq -c | sort -g | column
      1 '             2 O             9 w            30 u           117 n
      1 B             2 q            16 g            36 y           119 t
      1 G             3 I            20 .            48 m           132 o
      1 N             3 z            21 f            50 d           142 a
      1 P             4 j            23 b            51 h           147 s
      1 R             4 x            23 v            58 c           163 e
      2 C             5 A            24              60 l           254
      2 E             5 k            25 ,            90 r
      2 M             7 T            29 p           112 i

To reverse the listing to show the most frequently used characters first, add an r (reverse) option to that last sort command.

$ cat myfile | sed 's/(.)/n1/g' | sort | uniq -c | sort -gr | column
    254              60 l            24               5 A             2 C
    163 e            58 c            23 v             4 x             1 R
    147 s            51 h            23 b             4 j             1 P
    142 a            50 d            21 f             3 z             1 N
    132 o            48 m            20 .             3 I             1 G
    119 t            36 y            16 g             2 q             1 B
    117 n            30 u             9 w             2 O             1 '
    112 i            29 p             7 T             2 M
     90 r            25 ,             5 k             2 E

The character at the top of the list is, as I assume you guessed, the space character. The second most often used character in the file is an “e”. No surprise there either. In addition, capital letters are listed last since they are not frequently used.

Note that if you don’t want to distinguish between uppercase and lowercase letters you can insert a tr (translate) command into the command string like this:

$ cat myfile | sed 's/(.)/n1/g' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -gr | column"
    254             115 i            36 y            21 f             3 z
    165 e            91 r            30 u            20 .             2 q
    147 s            60 l            30 p            17 g             1 '
    147 a            60 c            25 ,             9 w
    134 o            51 h            24 b             5 k
    126 t            50 m            24               4 x
    118 n            50 d            23 v             4 j

Switch the positions of the “upper” and “lower” arguments to display the results all in uppercase.

Counting character-by-character in a word or phrase

You can also use a command similar to those shown above to count how many times each letter appears in a single word or phrase. Here’s an example:

$ echo "Hello, World!" | sed 's/(.)/n1/g' | sort | uniq -c | sort -gr |  column
      3 l             1 r             1 d             1
      2 o             1 H             1 ,             1
      1 W             1 e             1 !

Using an alias

While the commands shown above are clever, they’re not easy to remember or type. Creating an alias can help with this. Once you decide what form of output you prefer, turn the command into an alias like this:

$ alias CountChars="sed 's/(.)/n1/g' | sort | uniq -c | sort -gr | column"

Save the alias in your .bashrc file so that you can use it as needed. Then use it in commands like these:

$ cat myfile | CountChars
    254              60 l            24               5 A             2 C
    163 e            58 c            23 v             4 x             1 R
    147 s            51 h            23 b             4 j             1 P
    142 a            50 d            21 f             3 z             1 N
    132 o            48 m            20 .             3 I             1 G
    119 t            36 y            16 g             2 q             1 B
    117 n            30 u             9 w             2 O             1 '
    112 i            29 p             7 T             2 M
     90 r            25 ,             5 k             2 E
$ echo "Hello, World!" | CountChars
      3 l             1 r             1 d             1
      2 o             1 H             1 ,             1
      1 W             1 e             1 !

Using a script

If you want to see only alphabetic characters, you can use a script like the one shown below. It first changes all the letters to lowercase before it runs through the alphabet, uses awk to count the number of times each letter appears and then displays the counts only if they’re larger than 1. It only works with whatever string is provided as an argument.

#!/bin/bash # make argument all lowercase string=$(echo $1 | tr '[:upper:]' '[:lower:]') for char in {a..z} do count=`awk -F"${char}" '{print NF-1}'

Run it like this:

$ CountByChar "Hello, World!"
d:1
e:1
h:1
l:3
o:2
r:1
w:1

Note that characters will always be listed in alphabetical order. You can pipe the output to the column command if you want fewer lines of output.

$ CountByChar "Hello, World!" | column
d:1     e:1     h:1     l:3     o:2     r:1     w:1

Wrap-up

Whether you’re looking for character counts in files or phrases, there are some handy options available. Turning the complex ones into aliases is probably the best way to make the task easy.

by Sandra Henry Stocker

Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

By Howard Solomon

Feb 14, 20253 mins

FirewallsVulnerabilitiesZero-day vulnerability

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

By Zeus Kerravala

Feb 14, 20256 mins

Networking

Americas

Topics

About

Policies

Our Network

More

Counting individual characters on Linux

If you need to count how many of each character is included in a file or phrase, there are some handy commands you can string together to accomplish this along with scripts and aliases that can make the job easy.

Counting how many times each character appears in a file

Counting character-by-character in a word or phrase

Using an alias

Using a script

Wrap-up

More from this author

Red Hat completes Neural Magic acquisition

Examining disk space on Linux

Linux filesystems: Ext4, Btrfs, XFS, ZFS and more

How to work with text colors on Linux

Linux fundamentals: Viewing files, commands, processes and systems

Linux command line for beginners: 25 essential commands

5 Linux commands for measuring disk activity

How to loop forever in bash

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

Juniper CEO: 'I am disappointed and somewhat puzzled' by DOJ merger rejection

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command

Counting individual characters on Linux

If you need to count how many of each character is included in a file or phrase, there are some handy commands you can string together to accomplish this along with scripts and aliases that can make the job easy.

Counting how many times each character appears in a file

Counting character-by-character in a word or phrase

Using an alias

Using a script

Wrap-up

From our editors straight to your inbox

More from this author

Red Hat completes Neural Magic acquisition

Examining disk space on Linux

Linux filesystems: Ext4, Btrfs, XFS, ZFS and more

How to work with text colors on Linux

Linux fundamentals: Viewing files, commands, processes and systems

Linux command line for beginners: 25 essential commands

5 Linux commands for measuring disk activity

How to loop forever in bash

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

Juniper CEO: 'I am disappointed and somewhat puzzled' by DOJ merger rejection

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command