The Japanese electronics giant is making bold performance claims about its supercomputer processor. Credit: Riken Advanced Institute for Computational Science Arm processors on servers has gone from failed starts (Calxeda) to modest successes (ThunderX2) to real contenders (ThunderX3, Ampere). Now, details have emerged about Japanese IT giant Fujitsu’s Arm processor, which it claims will offer better HPC performance than Nvidia GPUs but at a lower power cost. Fujitsu is developing the A64FX, a 48-core Arm8 derivative specifically engineered for high-performance computing (HPC). Rather than design general-purpose compute cores, Fujitsu has added compute engines specific to artificial intelligence, machine learning, and other technologies specific to the needs of HPC. It will go in a new supercomputer called Fugaku, or Post-K. Post-K is a reference to the K supercomputer, at one time the fastest supercomputer in the world, that ran on custom Sparc chips before RIKEN Lab, where it was installed, pulled the plug. Fujitsu has revealed some new details, and they are impressive. The design of the A64FX is a major departure from traditional design. Instead of the chiplet design of the AMD Epyc and some Xeons, it is a single monolithic design. More important, there are four chips of High Bandwidth Memory 2 (HBM2), an expensive but very fast memory used only in high-end systems, connected to the CPU. Two 8GB modules are placed on each side of the CPU. Prototypes of the A64FX motherboard reveal it has no RAM DIMM sockets. An Intel or AMD motherboard will show up to a dozen memory DIMM sockets for each CPU but the A64FX motherboard has none. That’s because the A64FX has the HBM2 memory on the die for 32GB per CPU. In HPC, memory bandwidth has been the bottleneck, and data intensive workloads like analytics, simulations, and machine learning are slowing them down. And much more power – up to 100 times as much – is used in moving data around in HPC than in actually processing it. So to achieve energy efficiency, data needs to move as little as possible. So A64FX has a totally different design than your standard Arm or x86 chip. No system memory, just 32GB per processor of extremely fast memory directly connected to the chip via a high-speed interconnect instead of through a much slower memory bus. This will greatly reduce latency between CPU and memory and also reduce power because data doesn’t have to be moved in and out of memory sockets. The 48 cores of the A64FX function like a GPU in that they are connected by a very fast interconnect called Tofu, which was first used in the K supercomputer and has been advanced in the A64FX. Tofu is designed for energy efficiency and low latency. The A64FX is capable of 3Tflops of peak bandwidth while being 10 times more power efficient than a x86 processor. A Fugaku prototype made the number-one spot on the Green500 list, a list of the most energy efficient supercomputers published by the same group that does the Top500 supercomputer list, and that’s a prototype, not a finished design. In early benchmarks, Fujitsu claims to trounce the Xeon Platinum, Intel’s top of the line, and is competitive with Nvidia’s Volta line of HPC GPUs. However that’s not final silicon, and I always wait for third-party benchmarks. So why should you care? Because Fujitsu struck a deal with Cray to make HPC servers using A64FX and sold under the Cray brand name. Cray has since been bought out by HP Enterprise, so HPE will be peddling not one but two Arm-based servers, its more mainstream Project Moonshot servers, and A64FX. And there is a long history of technologies starting in HPC and slowly mainstreaming, from GPU computing to liquid cooling to modular server design. There’s no reason the A64FX can’t go mainstream either and bring AI, ML, and other high-performance tasks to more than just supercomputing facilities. The HBM2/no DIMMs is a massive twist on system memory, and I am really curious to see if Intel and AMD follow. Related content news High-bandwidth memory nearly sold out until 2026 While it might be tempting to blame Nvidia for the shortage of HBM, it’s not alone in driving high-performance computing and demand for the memory HPC requires. By Andy Patrizio May 13, 2024 3 mins CPUs and Processors High-Performance Computing Data Center news CHIPS Act to fund $285 million for semiconductor digital twins Plans call for building an institute to develop digital twins for semiconductor manufacturing and share resources among chip developers. By Andy Patrizio May 10, 2024 3 mins CPUs and Processors Data Center news HPE launches storage system for HPC and AI clusters The HPE Cray Storage Systems C500 is tuned to avoid I/O bottlenecks and offers a lower entry price than Cray systems designed for top supercomputers. By Andy Patrizio May 07, 2024 3 mins Supercomputers Enterprise Storage Data Center news Lenovo ships all-AMD AI systems New systems are designed to support generative AI and on-prem Azure. By Andy Patrizio Apr 30, 2024 3 mins CPUs and Processors Data Center PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe