Google says its TPU v4 supercomputer is more powerful and efficient than ever, thanks to optical circuit switching technology and architecture, and challenges Nvidia. A new white paper from Google details the company’s use of optical circuit switches in its machine learning training supercomputer, saying that the TPU v4 model with those switches in place offers improved performance and more energy efficiency than general-use processors. Google’s Tensor Processing Units — the basic building blocks of the company’s AI supercomputing systems — are essentially ASICs, meaning that their functionality is built in at the hardware level, as opposed to the general use CPUs and GPUs used in many AI training systems. The white paper details how, by interconnecting more than 4,000 TPUs through optical circuit switching, Google has been able to achieve speeds 10 times faster than previous models while consuming less than half as much energy. Aiming for AI performance, price breakthroughs The key, according to the white paper, is in the way optical circuit switching (performed here by switches of Google’s own design) enables dynamic changes to interconnect topology of the system. Compared to a system like Infiniband, which is commonly used in other HPC areas, Google says that its system is cheaper, faster and considerably more energy efficient. “Two major architectural features of TPU v4 have small cost but outsized advantages,” the paper said. “The SparseCore [data flow processors] accelerates embeddings of [deep learning] models by 5x-7x by providing a dataflow sea-of-cores architecture that allows embeddings to be placed anywhere in the 128 TiB physical memory of the TPU v4 supercomputer.” According to Peter Rutten, research vice president at IDC, the efficiencies described in Google’s paper are in large part due to the inherent characteristics of the hardware being used — well-designed ASICs are almost by definition better suited to their specific task than general use processors trying to do the same thing. “ASICs are very performant and energy efficient,” he said. “If you hook them up to optical circuit switches where you can dynamically configure the network topology, you have a very fast system.” While the system described in the white paper is only for Google’s internal use at this point, Rutten noted that the lessons of the technology involved could have broad applicability for machine learning training. “I would say it has implications in the sense that it offers them a sort of best practices scenario,” he said. “It’s an alternative to GPUs, so in that sense it’s definitely an interesting piece of work.” Google-Nvidia comparison is unclear While Google also compared TPU v4’s performance to systems using Nvidia’s A100 GPUs, which are common HPC components, Rutten noted that Nvidia has since released much faster H100 processors, which may shrink any performance difference between the systems. “They’re comparing it to an older-gen GPU,” he said. “But in the end it doesn’t really matter, because it’s Google’s internal process for developing AI models, and it works for them.” Related content how-to Compressing files using the zip command on Linux The zip command lets you compress files to preserve them or back them up, and you can require a password to extract the contents of a zip file. By Sandra Henry-Stocker May 13, 2024 4 mins Linux news High-bandwidth memory nearly sold out until 2026 While it might be tempting to blame Nvidia for the shortage of HBM, it’s not alone in driving high-performance computing and demand for the memory HPC requires. By Andy Patrizio May 13, 2024 3 mins CPUs and Processors High-Performance Computing Data Center opinion NSA, FBI warn of email spoofing threat Email spoofing is acknowledged by experts as a very credible threat. By Sandra Henry-Stocker May 13, 2024 3 mins Linux how-to Download our SASE and SSE enterprise buyer’s guide From the editors of Network World, this enterprise buyer’s guide helps network and security IT staff understand what Secure Access Service Edge (SASE) and Secure Service Edge) SSE can do for their organizations and how to choose the right solut By Neal Weinberg May 13, 2024 1 min SASE Remote Access Security Network Security PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe