Americas

  • United States
sandra_henrystocker
Unix Dweeb

Finding files on Linux with the longest names

How-To
Jul 05, 20224 mins
Linux

File names on Linux can be 255 characters, and here's how to find the longest ones.

File names on Linux systems can be as long as 255 characters. While determining which files in a directory have the longest names might not be the most exciting task at hand, doing this with a script poses some interesting challenges that invite equally interesting solutions.

To start, consider passing the output of the ls command, which is used to list files, to a wc command that counts the characters like this:

$ ls myreport.txt | wc -c
13

If you counted the letters in “myreport.txt” by looking at “myreport.txt”, you likely noticed that there are 12, not 13 letters in that file name. This is because, just as in the command below, echo sends the requested text through the pipe along with a newline character at the end.

$ echo hello | wc -c
6

You can see this issue more clearly by passing the same output to the od -bc command. It makes the inclusion of the newline very obvious.

$ echo hello | od -bc
0000000 150 145 154 154 157 012
          h   e   l   l   o  n  0000006

To avoid the extra character, just add a -n (remove newline) option to the command.

$ echo -n hello | wc -c
5

If you tried a command like the one below, you’d quickly see that the period is taken literally. The resulting “.” followed by a carriage return yields a length of 2.

$ for file in .
do
    echo $file | wc -c
done
2

The command below will generate a list with file names and lengths, but it has one serious problem. It will break file names including blanks into a number of parts and report the lengths of each part separately.

$ for file in `ls`
do
     echo -n “$file “
     echo -n $file | wc -c
 done

Here’s an example:

$ for file in `ls Speed*`
do
     echo -n “$file “
     echo -n $file | wc -c
done
Speeding 8
up 2
scripts 7
using 5
parallelization 15

In contrast, the command below will list all of the files in the current directory followed by their lengths.

$ for file in *
do
    echo -n "$file "
    echo -n $file | wc -c
done

The extra blank in the first echo command is used to leave a space between file names and lengths.

hello 5

Make some small changes and the command will sort the files by filename length.

$ for file in *; do     len=`echo -n $file | wc -c`;     echo $len $file; done | sort -n

Adding a tail command to the end will provide the name and length of the file with the longest name only.

$ for file in *; do     len=`echo -n $file | wc -c`;     echo $len $file; done | sort -n | tail -1
41 Speeding up scripts using parallelization

The script below displays only the file with the longest filename after prompting for the directory to be examined. It then finds the longest filename by retaining the longest filename encountered while looping through the files until it finds a longer one. The “for file in $dir/*” provides the needed looping without breaking up filenames on blanks.

It also ensures that the proper length of the longest file is included in the line following the “for file” command. It removes the name of the directory that it’s looking through along with the following “/” by using a sed command to reduce the string to just the file name. Commas are used in the sed command to avoid colliding with the backslash characters that are normally used with sed.

#!/bin/bash
# find file with longest filename

echo -n "dir> "
read dir
longestname=0

for file in $dir/*; do
    file=`echo $file | sed s,$dir/,,`
    sz=`echo $file | wc -c`             # get filename length
    if [ $sz -gt $longestname ]; then
        longestname=`expr $sz - 1`      # reduce by 1 for carriage return
        longname=$file
    fi
done

echo $longestname:  $longname

Running this script should look something like this:

$ ./LongFname
dir> .
41: Speeding up scripts using parallelization

$ ./LongFname
dir> ./bin
17: loop-days-of-week

Wrap-Up

Looping through a list of files to find those with the longest filenames requires a good understanding of how loops work and how blanks in filenames can complicate the required commands.

sandra_henrystocker
Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.