Nvidia used to design chips for gamers but with its latest hardware has now fully become an HPC and AI developer. Credit: Nvidia Nvidia, whose heritage lies in making chips for gamers, has announced its first new GPU architecture in three years, and it’s clearly designed to efficiently support the various computing needs of artificial intelligence and machine learning. The architecture, called Ampere, and its first iteration, the A100 processor, supplant the performance of Nvidia’s current Volta architecture, whose V100 chip was in 94 of the top 500 supercomputers last November. The A100 has an incredible 54 billion transistors, 2.5 times as many as the V100. Tensor performance, so vital in AI and machine learning, has been significantly improved. FP16 floating point calculations are almost 2.5x as fast as V100 and Nvidia introduced a new math mode called TF32. Nvidia claims TF32 can provide up to 10-fold speedups compared to single-precision floating-point math on Volta GPUs. This is significant because FP16 is useful for training, the compute-intensive part of machine learning, but overkill for inference, where the trained models are used to infer an outcome or result. So Nvidia added INT8 and INT4 to the A100 chip to handle the simpler inference part, and draw less power in the process. This means best performance cases for both training and inference from a single chip. Memory performance is also significantly improved thanks to 40GB of HBM2 memory on the die delivering a total of 1.6TB/second of bandwidth. And from the looks of the A100 die, Nvidia did what Fujitsu has done with its A64FX processor and put the HBM2 right next to the processor. The A100 also sports a new feature called Multi-Instance GPU (MIG), where a single A100 can be partitioned into up to seven virtual GPUs, each of which gets its own dedicated allocation of cores, L2 cache, and memory controllers. Think of it as virtualization for a GPU. Finally, Ampere comes with a new version of Nvidia’s high-speed interconnect, NVLink. The third generation of NVLink nearly doubles the signaling rate for NVLink from 25.78Gbps on NVLink 2 to 50Gbps on NVLink 3. Nvidia has also cut the number of lanes needed by half to achieve the same speed. This in turn allows it to double the amount of throughput through the same number of lanes. Nvidia CEO Jensen Huang made the Ampere announcement via video from his kitchen during the virtual GPU Technology Conference (GTC). New cards and Servers are ready Nvidia is wasting no time bringing the A100 to market. It says the A100 is in production and announced the DGX A100 system. The box comes with eight A100 accelerators, as well as 15 TB of storage, a pair of AMD Epyc 7742 CPUs with 64 cores each (you didn’t think they were going to use Intel processors, did you?), 1TB of RAM, and HDR InfiniBand Mellanox controllers. The DGX A100 will set you back $199,000 but it also packs 5 petaflops in a box the size of a small refrigerator, all dedicated to AI and machine learning. Also, Nvidia’s $7 billion merger with Mellanox is already bearing fruit in the form of the EGX A100 card, a combination of an A100 Ampere-based GPU package along with a Mellanox ConnectX-6 Dx NIC on one card. That provides the A100 with 200Gbps of networking without requiring any CPU processing and will allow A100 GPUs to talk directly rather than go through the CPU. All of this means greater speed since GPU-to-CPU communication adds steps and thus latency. The card can also connect to either Infiniband or Ethernet fabrics. GPU-to-GPU communication over Infiniband means HPC is about to see a major jump in performance. Related content news High-bandwidth memory nearly sold out until 2026 While it might be tempting to blame Nvidia for the shortage of HBM, it’s not alone in driving high-performance computing and demand for the memory HPC requires. By Andy Patrizio May 13, 2024 3 mins CPUs and Processors High-Performance Computing Data Center news CHIPS Act to fund $285 million for semiconductor digital twins Plans call for building an institute to develop digital twins for semiconductor manufacturing and share resources among chip developers. By Andy Patrizio May 10, 2024 3 mins CPUs and Processors Data Center news HPE launches storage system for HPC and AI clusters The HPE Cray Storage Systems C500 is tuned to avoid I/O bottlenecks and offers a lower entry price than Cray systems designed for top supercomputers. By Andy Patrizio May 07, 2024 3 mins Supercomputers Enterprise Storage Data Center news Lenovo ships all-AMD AI systems New systems are designed to support generative AI and on-prem Azure. By Andy Patrizio Apr 30, 2024 3 mins CPUs and Processors Data Center PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe