Americas

  • United States
sandra_henrystocker
Unix Dweeb

Many ways to sort file content on Linux

How-To
Nov 23, 20204 mins
Linux

The Linux sort command has an impressive number of ways to sort, from alphanumeric to random. Here's a look at some of the more useful ones.

The Linux sort command can arrange command output or file content in a lot more ways than you might realize–alphabetically, numerically, by month and randomly are only some of the more interesting choices. In this post, we take a look at some of the more useful sorting options and explain how they differ.

The default

The default sort might seem fairly straightforward. Digits come first, followed by letters and, for each letter, lowercase characters precede uppercase characters. You can expect to see this kind of ordering:

012345aAbBcCdDeE

ASCII order

Looking at the numeric byte values for each of these letters, you may note that what you see above is not the “natural order” as far as ASCII is concerned.

$ echo 012345aAbBcCdDeE | od -bc
0000000 060 061 062 063 064 065 141 101 142 102 143 103 144 104 145 105
          0   1   2   3   4   5   a   A   b   B   c   C   d   D   e   E

As you’ll notice in this octal dump of the list of characters, uppercase letters have lower ASCII values and would come before lowercase letters if they were listed in ASCII order. To sort by byte value, prepend your sort command with LC_ALL=C. For example, here’s a comparison of sorting in byte order compared with the default sort  order:

$ LC_ALL=C sort file		$ sort file
0				0
1				1
2				2
3				3
4				4
5				5
A 

 

Numeric order

To sort numerically, you need to use -n or you'll end up sorting numbers by character and 100 would pretend to be smaller than 2. Here's a comparison between a default sort and a numeric sort:

$ sort numbers			$ sort -n numbers
0                               0
1                               1
11                              4
4                               9
44                              11
9                               44

You can also sort numerically using a "human-friendly" sort order. This allows you to represent numbers with more than digits--such as 5M. The option for this sort order is -h. When you use it, 5K would be treated as larger than 500 and less than 5M. Here's a comparison of the default sort and a human-friendly sort:

$ sort numbers			$ sort -h numbers
0                               0
1                               1
11                              4
4                               9
44                              11
500                             44
5K                              500
5M                              5K
9                               5M

By Month

To sort by month name, you would use the -M option. Here's an example of a default sort and a sort by month:

$ sort months		            # sort -M months
Apr                                 Jan
Aug                                 Feb
Dec                                 Mar
Feb                                 Apr
Jan                                 May
Jul                                 Jun
Jun                                 Jul
Mar                                 Aug
May                                 Sep
Nov                                 Oct
Oct                                 Nov
Sep                                 Dec

Notice that sorting by month works whether you spell out the names of the months or use abbreviations:

$ sort -M months2
Jan
Feb
March
Apr
May
June
Jul
August
Sep
October
November
Dec

Understand that a sort by month is not a sort by date. This sort option assumes that all months are in the same year.

$ sort events			    $ sort -M events
Feb 10 2020 20:06 SOMETHING         Jan 23 2020 10:42 SOMETHING
Feb 11 2020 20:06 SOMETHING         Jan 29 2020 09:17 SOMETHING
Feb 12 2019 11:11 SOMETHING         Feb 10 2020 20:06 SOMETHING
Feb 27 2020 23:05 SOMETHING         Feb 11 2020 20:06 SOMETHING
Jan 23 2020 10:42 SOMETHING         Feb 12 2019 11:11 SOMETHING Jan 29 2020 09:17 SOMETHING         Feb 27 2020 23:05 SOMETHING 
Jun 26 2019 09:09 SOMETHING         Jun 26 2019 09:09 SOMETHING

Reversing listings

To reverse the order of your sorted listings, add the -r option. Here's a reverse listing of the months and human-readable numbers files:

$ sort -Mr months                   $ sort -hr numbers
Dec                                 5M
Nov                                 5k
Oct                                 500
Sep                                 44
Aug                                 11
Jul                                 9
Jun                                 4
May                                 1
Apr                                 0
Mar
Feb
Jan                           

Random sorting

To sort text in a pseudorandom fashion, use -R with your sort command. Here are some of the earlier sorts using the random option.

$ sort -R months		    $ sort -R numbers
Aug                                 500
Nov                                 4
Dec                                 44
Sep                                 5M
Apr                                 0
Jan                                 1
Jul                                 5K
Jun                                 11
May                                 9
Mar
Feb
Oct

The other way to sort data randomly is to use the shuf (for "shuffle") command. Here are a couple examples using data from earlier examples in this post:

$ shuf months			    $ shuf numbers
Nov                                 0
Jun                                 4
May                                 500
Aug                                 5K
Apr                                 11
Dec                                 44
Jul                                 1
Feb                                 9
Mar                                 5M
Oct
Sep
Jan 

Sorting Command Output

You can also pipe data to any of the sort commands shown. The command below might not be particularly useful, but it demonstrates the point and shows some other commands related to sorting.

$ apropos sort | sort -r
XConsortium (7)      - X Consortium information
versionsort (3)      - scan a directory for matching entries
tsort (1)            - perform topological sort
sort (1)             - sort lines of text files
qsort_r (3)          - sort an array
qsort (3)            - sort an array
comm (1)             - compare two sorted files line by line
bzip2 (1)            - a block-sorting file compressor, v1.0.8
bunzip2 (1)          - a block-sorting file compressor, v1.0.8
bsearch (3)          - binary search of a sorted array
apt-sortpkgs (1)     - Utility to sort package index files
alphasort (3)        - scan a directory for matching entries

sandra_henrystocker
Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.