Americas

  • United States
sandra_henrystocker
Unix Dweeb

Open-source containers move toward high-performance computing

Analysis
Dec 18, 20184 mins
LinuxOpen SourceTechnology Industry

Thanks to the efforts of Sylabs, open-source containers are starting to focus on high-performance computing—providing new ways of working for enterprise IT organizations.

Open-source containers are moving in a direction that many of us never anticipated.

Long recognized as providing an effective way to package applications with all of their required components, some are also tackling one of the most challenging areas in the compute world today — high-performance computing (HPC). And while containers can bring a new level of efficiency to the world of HPC, they’re also presenting new ways of working for enterprise IT organizations that are running HPC-like jobs.

How containers work

Containers offer many advantages to organizations seeking to distribute applications. By incorporating an application’s many dependencies (libraries, etc.) into self-sustainable images, they avoid a lot of installation problems. The differences in OS distributions have no impact, so separate versions of applications don’t have to be prepared and maintained, thus making developers’ work considerably easier.

The challenge of HPC

Until quite recently, the high-performance market with its emphasis on big data and supercomputing, paid little attention to containers. This was largely because the tightly coupled technology model of supercomputing didn’t fit well into the loosely coupled microservices world that containers generally serve. There were security concerns, as well, since. For example, Docker applications often bestow root privileges on those running them — an issue that doesn’t work very well in the supercomputing world where security is exceedingly important.

A significant change came about when Singularity — a container system with a focus on high-performance computing — became available. Now provided by Sylabs, Singularity began as an open-source project at Lawrence Berkeley National Laboratory in 2015.

Singularity was born because there was a lot of interest in containers for compute, but the commonly used containers (Docker) at the time did not support compute-focused, HPC-type use cases. Scientists used containers and shared their work on Docker Hub, but because Docker was not supportable on HPC, Singularity was created as a response to user demand for a compute-focused technology.

Singularity is now also available in a commercial version called Singularity Pro for a modest per-host fee and optional support.

How Singularity fits in

Singularity is a system that was built with the goal of running containers on HPC systems. In fact, at this point in time, it probably claims 90 to 95 percent of the market share in this domain. This is largely because it began life with a clear focus on HPC. Given its birth at Lawrence Berkeley National Laboratory, its focus on big data, scientific computing, and security could have been predicted, though its quick rise to fame has taken much of the container industry by surprise.

Singularity is also remarkably easy to work with. It can import Docker images even if you don’t have Docker installed and without you needing to be a super user. In fact, Singularity containers can package entire scientific workflows along with software, libraries, and data.

Singularity’s notable presence at SC18 led to its winning of three HPCwire awards, including the Top 5 New Products or Technologies to Watch for its third year straight.

Integration with Kubernetes

Sylabs is also working toward an integration with Kubernetes (an open-source system for automating deployment, scaling, and management of containerized applications). This project was announced at SC18 just last month. The current proof-of-concept release is inviting developers to contribute.

The integration of Singularity with Kubernetes is in response to a pressing need from companies running service-based compute jobs that involve streaming data and real-time analytics — jobs that require an orchestrator.

This integration means that users will have access to both an orchestrator and a scheduler within Singularity. While many people use Kubernetes for batch jobs, it severely lacks the features of even the simplest of HPC resource managers/schedulers. This may change in time, but for now, traditional batch-based compute is best suited by HPC-focused resource managers such as those also supported by Singularity.

sandra_henrystocker
Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.