Americas

  • United States
sandra_henrystocker
Unix Dweeb

Unix: How to select every 1,000th line from a file

Analysis
Oct 07, 20142 mins
Big DataData CenterOpen Source

Log files on Unix systems can easily grow to hundreds of thousands or even millions of lines. Here's a simple way to pluck out every Nth line.

Head and tail are great commands when you want to look only at the beginning or the ending of files. Getting a feel for how the lines in a file are changing over time, on the other hand, can take a lot of time if you’ve got to scan through thousands of lines.

What if you could look at every 100th, 1,000th or 10,000th line? That’s surprisingly easy if you use a particular sed command. And you can modify the command to change the frequency setting. The command for picking out every 1,000th line is sed -n ‘0~1000p’. Changing “1000” to any other number increases or decreases the frequency with which lines are displayed.

$ sed -n '0~1000p' /var/log/syslog

This command will display every 1,000th line. The -n tells sed not to display every line it encounters. In other words, it suppresses automatic display of the lines. The ‘0~1000’ argument tells it to select each 1,000th line from the target file. The 0 tells it to start with line 0 which, of course, doesn’t exist, and the 1000 tells it how many lines to skip over until you print again. Using a sed command like this, you can also display the 10th or 100th, etc. line is piped input. The last piped to sed command shown below will display every 250th login from the /var/log/wtmp file.

$ last | sed -n '0~250p'

You don’t have to start at the beginning of the file if you don’t want to. In the command below, you would start with line 500 and then print every 25th line from that point on:

$ sed -n '500~25p' /var/log/syslog

The numbers you select are up to you. The output will be displayed with line numbers that you can use to verify that your command is working as expected before you put it into use and to give you an idea where you are in the file or command output you are examining.

   500  shs      pts/4        2013-08-03 12:24 (207.111.99.25)
   525  shs      pts/4        2013-08-21 12:37 (pool-123-45-67-890.bltmmd.fios.verizon.net)

This is a useful sed command for scanning files at whatever granularity works for you.

sandra_henrystocker
Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.