by Sandra Henry Stocker

Unix Dweeb

Removing duplicate characters from a string on Linux with awk

How-To

Jun 13, 20223 mins

Linux

A clever awk command can make it easy to remove duplicate characters from a string.

language / alphabet / letters / characters

Credit: andy.brandon50

The awk command can make it easy to remove duplicate characters from a string even when those characters aren’t sequential, especially when the process is turned into a script.

First, the awk command that we’ll be using starts by running through each letter in the string. In a more common command, you might see awk doing something like this:

$ echo one:two:three | awk ‘BEGIN {FS =":"} ; { print $2 }’
two

The FS portion of that command specifies the field separator—the character that is used to separate the fields in the string so that they can be processed separately.

What our script does, however, is use a field separator of “” (i.e., no character). This tells awk that there are no field separators. In other words, every character is treated as if it is itself a field. Here’s are a couple examples:

$ echo one:two:three | awk ‘BEGIN { FS ="" } ; { print $2 }’
n
$ echo one:two:three | awk ‘BEGIN { FS ="" } ; { print $4 }’
:

Note that the commands above end up displaying the second and fourth characters in the string, not the second and fourth “fields” and that no distinction is made between blanks, letters and various punctuation characters.

A bash script that uses awk to remove duplicate characters might look like this:

#!bin/bash echo -n “Enter string: “ read string awk -v FS="" ‘{ for(i=1;i

That script prompts for a string and then uses awk to run through it one character at a time. It adds each successive character to the string (str) only if that character isn’t already included. The characters are otherwise left in their original positions, with no sorting or further processing. Here’s an example of running it:

$ ./rmdups
Enter string: Let’s go fly a kite!
Let’s goflyaki!

Notice that each character appears only once in the “Let’s goflyaki!” results. The final result of the process is displayed in the print statement in the END portion of the awk command.

If you want to see how the script works by viewing the string of characters growing as characters are added, you can use this version of the script instead:

#!/bin/bash echo -n “Enter string: “ read characters awk -v FS="" ‘{ for(i=1;i# } } END {print str}’

Running the script with the extra print command, you would see output like this:

$ ./rmdups2
Enter string: Let’s go fly a kite!
L
Le
Let
Let’
Let’s
Let’s
Let’s g
Let’s go
Let’s go
Let’s gof
Let’s gofl
Let’s gofly
Let’s gofly
Let’s goflya
Let’s goflya
Let’s goflyak
Let’s goflyaki
Let’s goflyaki
Let’s goflyaki
Let’s goflyaki!
Let’s goflyaki!

Notice that the string grows only when the current character is not already included in the string.

You could also implement the script simply as an awk script like this:

awk -v FS="" ‘{ for(i=1;i

You could then run the awk script like this:

$ echo “Let’s go fly a kite!” | rmdups.awk
Let’s goflyaki!

Wrap-Up

Whenever processing duplicated characters more than once would be a serious waste of processing power, an awk command like that shown in this post can remove them quite easily.

by Sandra Henry Stocker

Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

By Howard Solomon

Feb 14, 20253 mins

FirewallsVulnerabilitiesZero-day vulnerability

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

By Zeus Kerravala

Feb 14, 20256 mins

Networking

Americas

Topics

About

Policies

Our Network

More

Removing duplicate characters from a string on Linux with awk

A clever awk command can make it easy to remove duplicate characters from a string.

Wrap-Up

More from this author

Red Hat completes Neural Magic acquisition

Examining disk space on Linux

Linux filesystems: Ext4, Btrfs, XFS, ZFS and more

How to work with text colors on Linux

Linux fundamentals: Viewing files, commands, processes and systems

Linux command line for beginners: 25 essential commands

5 Linux commands for measuring disk activity

How to loop forever in bash

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

Juniper CEO: 'I am disappointed and somewhat puzzled' by DOJ merger rejection

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command

Removing duplicate characters from a string on Linux with awk

A clever awk command can make it easy to remove duplicate characters from a string.

Wrap-Up

From our editors straight to your inbox

More from this author

Red Hat completes Neural Magic acquisition

Examining disk space on Linux

Linux filesystems: Ext4, Btrfs, XFS, ZFS and more

How to work with text colors on Linux

Linux fundamentals: Viewing files, commands, processes and systems

Linux command line for beginners: 25 essential commands

5 Linux commands for measuring disk activity

How to loop forever in bash

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

Juniper CEO: 'I am disappointed and somewhat puzzled' by DOJ merger rejection

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command