SambaNova DataScale servers can perform both AI training and inference, which eliminates expensive data movement.
SambaNova Systems is now shipping the second-generation of its DataScale systems specifically built for AI and machine learning.
You may not have heard of SambaNova, a startup led by ex-Oracle/Sun hardware executives and Stanford professors, but its work is likely familiar. The Lawrence Livermore National Laboratory was an early adopter of DataScale and used the systems in its COVID-19 antiviral compound and therapeutic research in 2020.
“Our systems were deployed in supercomputers at the Lawrence Livermore National Laboratory, which were then used by various parties for the research and development of COVID-19 antiviral compound and therapeutics,” said Marshall Choy, SambaNova’s senior vice president for products. “So, yes, they were a small part of that. As bad as the pandemic was, at least we got to do something good through it.”
SambaNova actually started out as a software company, as part of a DARPA-funded research project. Choy said the company’s early mission was to build a software stack which would create greater ease of use and flexibility for developers to develop data flow applications, such as machine-learning workloads. But the company was unhappy with the hardware on the market and decided to make its own.
The DataScale SN30 is a complete hardware and software stack in a 2U shell that plugs into a standard data center rack. The server is powered by the Cardinal SN30 RDU (Reconfigurable Data Unit) processor, SambaNova’s own homebrewed chip and made by TSMC.
The Cardinal SN30 RDU contains 86 billion transistors and is capable of 688 teraflops at bfloat16 precision. SambaNova wasn’t happy with the performance and power draw of CPUs and GPUs and felt that they were not best suited for neural networks.
“The rate of change in neural networks is such that any sort of fixed function processor would be obsolete by the time it was taped out and delivered. You need a flexible silicon substrate, and that’s what we’ve built – an architecture that can be reconfigured at each clock cycle to the needs of the underlying operators that are being executed from the software,” said Choy.
That sounds like a FPGA, but that’s not exactly what it is. Choy called the chip a CGRA, or coarse-grained reconfigurable architecture. FPGAs are very flexible but are pretty difficult to program. SambaNova designed the chip to be more high level for machine learning frameworks and to be not as complicated as FPGAs can be.
Along with the hardware comes the SambaFlow Linux-based software stack, with enhancements around enterprise integration, such as native Kubernetes support for the orchestration of containerized and virtualized models.
According to SambaNova, when training a 13-billion parameter GPT-3 model, the new DataScale SN30 system ran six times faster than an eight-socket Nvidia DGX A100 system. And Choy said the systems are capable of doing both the training and inference parts of AI, which are usually done by two separate systems.
“Traditionally, with CPUs and GPUs, you would do your training on a GPU, and then you do your inference on the CPU. That incurs a lot of data movement back and forth between systems. With SambaNova, we have a single system image that can do both training and inference. And so you see the elimination of that expensive data movement,” he said.
DataScale systems are available for on-premises deployment and on-demand through cloud service provider partners.