Credit: Spencer Whalen / Getty Images There are many ways to extract substrings from lines of text using Linux and doing so can be extremely useful when preparing scripts that may be used to process large amounts of data. This post describes ways you can take advantage of the commands that make extracting substrings easy. Using bash parameter expansion When using bash parameter expansion, you can specify the starting and ending positions for the text that you want to extract. For example, you can create a variable by assigning it a value and then use syntax like that shown below to select a portion of it. $ string="Happy days are here again" $ echo ${string:1:10} appy days $ echo ${string:0:9} Happy days Note that the example above makes it clear that this technique starts position numbering at 0. So, in the next example, the 7 represents the eighth character in the string and the -2 means to drop the last 2 characters. As a result, the substring in the first example below has a single character and the second has all but the last two. $ string="1234567890" $ echo ${string:7:-2} 8 $ echo ${string:0:-2} 12345678 In this next example, we first create a variable using “set –” and then use echo to display the eighth and ninth characters. In other words, it starts with the eighth character (7) and then displays two characters. $ set -- 01234567890abcdef $ echo ${1:7:2} 78 NOTE: You could display the string created with the set command by simply using the command “echo $1”. This is what is referenced by the “1” in the example above. $ set -- 01234567890abcdef $ echo $1 01234567890abcdef Using cut The cut command can be used in several ways to yank substrings from text. The -c option allows you to select the character positions to be displayed. For cut, character numbering starts at 1. $ echo "12345" | cut -c 1-3 123 In this next example, we select the last two words by character position. If you select more characters than are available, it doesn’t affect the output. $ echo "Have some fun" | cut -c 6-13 some fun $ cut -c 6-13 In addition, you can pipe text to the cut command or use the cut command to work with text in a file. Just be sure that the positions work for every line. $ cat myfile $ cut -c 6-15 myfile Have some fun some fun Grab your lunch your lunch Take nice nap nice nap The cut command can also work with delimiters and this often makes it a lot easier to use with files in which the words or fields don't line up precisely. To work with a file of mailing addresses, for example, you could do this to pull out the third field in the comma-separated addresses: $ cat addresses $ cut -d, -f3 addresses 6803 Gravel Road,Hurlock,MD MD 121 Blueberry Drive,Outback,VA VA 1427 N 12th Street,Reading,PA PA 2001 Turtle Road,Baker,WV WV 264 Dakota Street,Groton,CT CT 111 Mindless Circle,Celery,TX TX 1089 Plymouth Drive,Rahway,NJ NJ 949 Endless Lane,Hoboken,NJ NJ 2001 Turtle Road,Outback,VA VA You can select multiple fields by specifying a range (e.g., "2-3") or a sequence (e.g., "2,3") as shown below. $ cut -d, -f2-3 addresses $ cut -d, -f2,3 addresses Hurlock,MD Hurlock,MD Outback,VA Outback,VA Reading,PA Reading,PA Baker,WV Baker,WV Groton,CT Groton,CT Celery,TX Celery,TX Rahway,NJ Rahway,NJ Hoboken,NJ Hoboken,NJ Outback,VA Outback,VA Using awk The awk command can also be used to extract substrings. Here's an example of pulling text from a supplied phrase: $ awk '{print substr($0,6,8)}' The $0 represents the complete phrase. To work with a file with delimited fields, use the -F (field delimiter) option. In this case, the delimiter is a comma. Use -F':' if the file is colon-delimited. $ awk -F',' '{print $3}' addresses | sort | uniq CT MD NJ PA TX VA WV If your fields are separated with both a comma and a space, that is no problem for awk. Just specify that in the command like this: $ awk -F', ' '{print $3}' addresses | sort | uniq CT MD NJ PA TX VA WV In fact, if you want the awk command to work regardless of whether fields are separated with just commas or both commas and blanks, you can do this: $ awk -F', ?' '{print $3}' addresses | sort | uniq CT MD NJ PA TX VA WV Using awk, you can also display two fields by using syntax like this: $ awk -F',' '{print $2,$3}' addresses | sort | uniq Baker WV Celery TX Groton CT Hoboken NJ Hurlock MD Outback VA Rahway NJ Reading PA Using expr To use the expr command, type “expr substr” followed by your string, the start position and the string length. $ expr substr "Have some fun" 6 8 some fun $ str="Have some fun" $ expr substr "$str" 6 8 some fun Wrap-Up There are lots of ways to extract substrings on Linux, but each of the commands you might use has its own quirks and its own advantages. Related content how-to Compressing files using the zip command on Linux The zip command lets you compress files to preserve them or back them up, and you can require a password to extract the contents of a zip file. By Sandra Henry-Stocker May 13, 2024 4 mins Linux opinion NSA, FBI warn of email spoofing threat Email spoofing is acknowledged by experts as a very credible threat. By Sandra Henry-Stocker May 13, 2024 3 mins Linux how-to The logic of && and || on Linux These AND and OR equivalents can be used in scripts to determine next actions. By Sandra Henry-Stocker May 02, 2024 4 mins Linux how-to Using the apropos command on Linux By Sandra Henry-Stocker Apr 24, 2024 3 mins Linux PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe