Americas

  • United States

Can Fujitsu beat Nvidia in the HPC race?

News Analysis
Apr 05, 20204 mins
Data Center

The Japanese electronics giant is making bold performance claims about its supercomputer processor.

K Computer supercomputer
Credit: Riken Advanced Institute for Computational Science

Arm processors on servers has gone from failed starts (Calxeda) to modest successes (ThunderX2) to real contenders (ThunderX3, Ampere). Now, details have emerged about Japanese IT giant Fujitsu’s Arm processor, which it claims will offer better HPC performance than Nvidia GPUs but at a lower power cost.

Fujitsu is developing the A64FX, a 48-core Arm8 derivative specifically engineered for high-performance computing (HPC). Rather than design general-purpose compute cores, Fujitsu has added compute engines specific to artificial intelligence, machine learning, and other technologies specific to the needs of HPC.

It will go in a new supercomputer called Fugaku, or Post-K. Post-K is a reference to the K supercomputer, at one time the fastest supercomputer in the world, that ran on custom Sparc chips before RIKEN Lab, where it was installed, pulled the plug.

Fujitsu has revealed some new details, and they are impressive. The design of the A64FX is a major departure from traditional design. Instead of the chiplet design of the AMD Epyc and some Xeons, it is a single monolithic design. More important, there are four chips of High Bandwidth Memory 2 (HBM2), an expensive but very fast memory used only in high-end systems, connected to the CPU. Two 8GB modules are placed on each side of the CPU.

Prototypes of the A64FX motherboard reveal it has no RAM DIMM sockets. An Intel or AMD motherboard will show up to a dozen memory DIMM sockets for each CPU but the A64FX motherboard has none. That’s because the A64FX has the HBM2 memory on the die for 32GB per CPU.

In HPC, memory bandwidth has been the bottleneck, and data intensive workloads like analytics, simulations, and machine learning are slowing them down. And much more power – up to 100 times as much – is used in moving data around in HPC than in actually processing it. So to achieve energy efficiency, data needs to move as little as possible.

So A64FX has a totally different design than your standard Arm or x86 chip. No system memory, just 32GB per processor of extremely fast memory directly connected to the chip via a high-speed interconnect instead of through a much slower memory bus. This will greatly reduce latency between CPU and memory and also reduce power because data doesn’t have to be moved in and out of memory sockets.

The 48 cores of the A64FX function like a GPU in that they are connected by a very fast interconnect called Tofu, which was first used in the K supercomputer and has been advanced in the A64FX. Tofu is designed for energy efficiency and low latency. The A64FX is capable of 3Tflops of peak bandwidth while being 10 times more power efficient than a x86 processor.

A Fugaku prototype made the number-one spot on the Green500 list, a list of the most energy efficient supercomputers published by the same group that does the Top500 supercomputer list, and that’s a prototype, not a finished design.

In early benchmarks, Fujitsu claims to trounce the Xeon Platinum, Intel’s top of the line, and is competitive with Nvidia’s Volta line of HPC GPUs. However that’s not final silicon, and I always wait for third-party benchmarks.

So why should you care? Because Fujitsu struck a deal with Cray to make HPC servers using A64FX and sold under the Cray brand name. Cray has since been bought out by HP Enterprise, so HPE will be peddling not one but two Arm-based servers, its more mainstream Project Moonshot servers, and A64FX.

And there is a long history of technologies starting in HPC and slowly mainstreaming, from GPU computing to liquid cooling to modular server design. There’s no reason the A64FX can’t go mainstream either and bring AI, ML, and other high-performance tasks to more than just supercomputing facilities.

The HBM2/no DIMMs is a massive twist on system memory, and I am really curious to see if Intel and AMD follow.

Andy Patrizio is a freelance journalist based in southern California who has covered the computer industry for 20 years and has built every x86 PC he’s ever owned, laptops not included.

The opinions expressed in this blog are those of the author and do not necessarily represent those of ITworld, Network World, its parent, subsidiary or affiliated companies.