Unix Dweeb

Open-sourced tool speeds up Linux scripts via parallelization

News

Jun 28, 20224 mins

The pa.sh tool finds sections of code that can run independently, then runs them in parallel to save time.

Researchers have open-sourced pa.sh (also called pash), a tool that can dramatically speed up Linux scripts by using parallelization, saving time and without risk of introducing errors.

The process of parallelization first examines a script for code that can be run separately and independently, so not all scripts can benefit from the tool. But when pa.sh does find portions that can run independently, it runs them in parallel on separate CPUs. It also uses other techniques to get the code to run faster.

Below is a demonstration I ran on my home Fedora box, first running a script on its own and then again using pa.sh. Note that this script was provided with the pa.sh tool and lends itself to parallelization. It’s not nearly as demanding as scripts that might process gigabytes of data in a scientific or artificial-intelligence lab, so the results are not dramatic.

Running the script on the command line

I used the time command to gauge the performance of the hello-world.sh script.

$ time ./evaluation/intro/hello-world.sh
2176

real    0m55.077s
user    0m54.815s
sys     0m0.062s

NOTE: The “2176” on the second line is the script’s output.

Running the script using pa.sh

In the next command, I ran the same script through pa.sh.

$ time ./pa.sh ./evaluation/intro/hello-world.sh
2176

real    0m19.216s
user    0m37.509s
sys     0m0.255s

Notice that when run using pa.sh, the script used little more than a third of the time (real time) that it used when run directly. If I run a script that simply loops from 1 to 10,000 and display the count every 100th step, it takes significantly longer to run using pa.sh. That’s because with pa.sh, the script doesn’t benefit from parallelization but still requires an analysis:

$ time ./count_to_10000         $ time pa.sh ./count_to_10000
100                             100
200                             200
300                             300
400                             400
500                             500
600                             600
700                             700
800                             800
900                             900
1000                            1000
real    0m0.010s                real    0m59.121s
user    0m0.007s                user    0m41.386s
sys     0m0.003s                sys     0m19.263s

The script runs a single loop and looks like this and provides no opportunity for parallelization:

for num in {1..1000}
do
  if [[ "$num" == *"00" ]]; then
    echo $num
  fi
done

For complex scripts that can benefit from parallelization, however, pash can make a tremendous difference in how long they take to run. All you have to do is invoke your scripts using pa.sh. And, as already noted, pa.sh does this without introducing errors, so you can be confident that you will get the results expected, just a whole lot faster. If you are using scripts that need to process a large amount of data, this can save a lot of time.

Installing and using pa.sh

You will need to have tools like sudo, wget, and curl, but these tools are likely already available on your Linux system.

Once pa.sh is installed, you will need to export $PASH_TOP that will point to the top of the directory where it is installed. For example:

$ export PASH_TOP=/opt/pash
$ echo $PASH_TOP
/opt/pash

Wrap-Up

From everything I’ve seen and read, pa.sh can provide a dramatic performance improvement to complex and data-hungry scripts. If you or your organization might benefit from this kind of tool, it is well worth looking into.

pa.sh is hosted by the Linux Foundation and steered by a committee of researchers and practitioners that have been working on the system for more than two years.

The tool, as well as the example code, are open source, and pa.sh is available at github. There is no man page, but help is available when you use the pa.sh –help command. A technical paper explaining pa.sh has been posted by Nikos Vasilakis, a research scientist at MIT’s Computer Science & Artificial Intelligence Laboratory (CSAIL) who chairs the committee working on the tool. MIT announced pa.sh earlier this month. Stevens Institute of Technology is also involved in its development.

by Sandra Henry Stocker

Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

By Howard Solomon

Feb 14, 20253 mins

FirewallsVulnerabilitiesZero-day vulnerability

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

By Zeus Kerravala

Feb 14, 20256 mins

Networking

Americas

Topics

About

Policies

Our Network

More

Open-sourced tool speeds up Linux scripts via parallelization

The pa.sh tool finds sections of code that can run independently, then runs them in parallel to save time.

Running the script on the command line

Running the script using pa.sh

Installing and using pa.sh

Wrap-Up

More from this author

Digging into voice AI platform Deepgram

Linux containers in 2025 and beyond

Red Hat completes Neural Magic acquisition

Examining disk space on Linux

Linux filesystems: Ext4, Btrfs, XFS, ZFS and more

How to work with text colors on Linux

Linux fundamentals: Viewing files, commands, processes and systems

Linux command line for beginners: 25 essential commands

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

Juniper CEO: 'I am disappointed and somewhat puzzled' by DOJ merger rejection

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command

Open-sourced tool speeds up Linux scripts via parallelization

The pa.sh tool finds sections of code that can run independently, then runs them in parallel to save time.

Running the script on the command line

Running the script using pa.sh

Installing and using pa.sh

Wrap-Up

From our editors straight to your inbox

More from this author

Digging into voice AI platform Deepgram

Linux containers in 2025 and beyond

Red Hat completes Neural Magic acquisition

Examining disk space on Linux

Linux filesystems: Ext4, Btrfs, XFS, ZFS and more

How to work with text colors on Linux

Linux fundamentals: Viewing files, commands, processes and systems

Linux command line for beginners: 25 essential commands

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

Juniper CEO: 'I am disappointed and somewhat puzzled' by DOJ merger rejection

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command