The next-gen Blackwell architecture will offer a 4x performance boost over the current Hopper lineup, Nvidia claims. Credit: Nvidia Nvidia kicked off its GTC 2024 conference with the formal launch of Blackwell, its next-generation GPU architecture due at the end of the year. Blackwell uses a chiplet design, to a point. Whereas AMD’s designs have several chiplets, Blackwell has two very large dies that are tied together as one GPU with a high-speed interlink that operates at 10 terabytes per second, according to Ian Buck, vice president of HPC at Nvidia. Nvidia will deliver three new Blackwell data center and AI GPUs: the B100, B200, and GB200. The B100 has a single processor, the B200 has two GPUs interconnected, and the GB200 features two GPUs and a Grace CPU. Buck says the GB200 will deliver inference performance that’s seven times greater than the Hopper GH200 can deliver. It delivers four times the AI training performance of Hopper, 30 times better inference performance overall, and 25 times better energy efficiency, Buck claimed. “This will expand AI data center scale to beyond 100,000 GPUs,” he said on a press call ahead of the announcement. Blackwell has 192GB of HBM 3E memory with more than 8TB/sec of bandwidth and 1.8 TB of secondary link. Blackwell also supports the company’s second-generation transformer engine, which tracks the accuracy and dynamic range of every layer of every tensor and the entire neural network as it proceeds in computing. Blackwell has 20 petaflops of FP4 AI performance on a single GPU. FP4, with four bits of floating point precision per operation, is new to the Blackwell processor. Hopper had FP8. The shorter the floating-point string, the faster it can be executed. That’s why as floating-point strings go up – FP8, FP16, FP32, and FP64 – performance is cut in half with each step. Hopper has 4 Pflops of FP8 AI performance, which is less than half the performance of Blackwell. Blackwell also has a new transformer engine to automatically detect what layers of the model can deal with what precision, ranging from FP4 to FP64. The higher the precision, the longer it takes to process and the more energy it uses. This new transformer engine automatically switches to a lesser or greater precision as it is needed. Previous generations required programming the processor to switch math precision. “Our big innovation here is you don’t need to hand code that as a user. You can let the system take care of that for you,” said Charlie Boyle, vice president of DGX systems at Nvidia. “And it does it safely, meaning it stores the weights at higher precision than it needs to to maintain that accuracy, and in areas where you don’t need that level of precision to get the same amount of accuracy.” The high-speed interconnect, NVLink, is as significant as the GPU technology itself. This is the fifth generation of NVLink designed to provide efficient scaling for a trillion-parameter mixture of disparate models, said Buck. This allows Blackwell to deliver 18 times faster throughput and performance in multi-node interconnects. In addition to new GPUs, Nvidia is announcing its next generation InfiniBand, the Quantum-X800 QDR, an AI-dedicated Infrastructure with advanced feature sets crucial for multi-tenant generative AI clouds and large enterprises. The X800 includes the Nvidia Quantum Q3400 switch and the Nvidia ConnectXR-8 SuperNIC, which together achieve end-to-end throughput of 800Gb/s. This is five times the bandwidth capacity and a nine-fold increase to 14.4Tflops of in-network computing compared to the previous generation. Blackwell products are planned for release later this year, while Quantum-X800 and Spectrum-X800 will be available next year. GTC runs this week in San Jose, Calif. Related content news High-bandwidth memory nearly sold out until 2026 While it might be tempting to blame Nvidia for the shortage of HBM, it’s not alone in driving high-performance computing and demand for the memory HPC requires. By Andy Patrizio May 13, 2024 3 mins CPUs and Processors High-Performance Computing Data Center news CHIPS Act to fund $285 million for semiconductor digital twins Plans call for building an institute to develop digital twins for semiconductor manufacturing and share resources among chip developers. By Andy Patrizio May 10, 2024 3 mins CPUs and Processors Data Center news HPE launches storage system for HPC and AI clusters The HPE Cray Storage Systems C500 is tuned to avoid I/O bottlenecks and offers a lower entry price than Cray systems designed for top supercomputers. By Andy Patrizio May 07, 2024 3 mins Supercomputers Enterprise Storage Data Center news Lenovo ships all-AMD AI systems New systems are designed to support generative AI and on-prem Azure. By Andy Patrizio Apr 30, 2024 3 mins CPUs and Processors Data Center PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe