by Andy Patrizio

Inside Nvidia’s new AI supercomputer

News Analysis

May 30, 20234 mins

Nvidia's Grace Hopper CPU/GPU combo underpins its supercomputer the company claims can crank out nearly an exaFLOP of AI performance.

With Nvidia’s Arm-based Grace processor at its core, the company has introduced a supercomputer designed to perform AI processing powered by a CPU/GPU combination.

The new system, formally introduced at the Computex tech conference in Taipei the DGX GH200 supercomputer is powered by 256 Grace Hopper Superchips, technology that is a combination of Nvidia’s Grace CPU, a 72-core Arm processor designed for high-performance computing and the Hopper GPU. The two are connected by Nvidia’s proprietary NVLink-C2C high-speed interconnect.

The DGX GH200 features a massive shared memory space of more than 144TB of HBM3 memory connected by its NVLink-C2C interconnect technology. The system is a simplified design, and its processors are seen by thier software as as one giant GPU with one giant memory pool, said Ian Buck, vice president and general manager of Nvidia’s hyperscale and HPC business unit.

He said the system can be deployed and trained with Nvidia’s help in AI models that can require memory beyond the bounds of what a single GPU supports. “We need a completely new system architecture that can break through one terabyte of memory in order to train these giant models,” he said.

Nvidia claims an exaFLOP of performance, but that’s from eight-bit FP8 processing. Now the majority of AI processing is being done using 16-bit Bfloat16 instructions, which would take twice as long. One way of looking at it is you could have a supercomputer that ranks in the top 10 of the TOP500 supercomputer list and occupy a comparatively modest space.

By using NVLink instead of standard PCI Express interconnects, the bandwidth between GPU and CPU is seven times faster and requires a fifth of the interconnect power.

Google Cloud, Meta, and Microsoft are among the first expected to gain access to the DGX GH200 to explore its capabilities for generative AI workloads. Nvidia also intends to provide the DGX GH200 design as a blueprint to cloud service providers and other hyperscalers so they can further customize it for their infrastructure. Nvidia DGX GH200 supercomputers are expected to be available by the end of the year.

Software is included.

These supercomputers come with Nvidia software installed to provide a turnkey product that includes Nvidia AI Enterprise, the primary software layer for its AI platform featuring frameworks, pretrained models, and development tools; and Base Command for enterprise-level cluster management.

DGX GH200 is the first supercomputer to pair Grace Hopper Superchips with Nvidia’s NVLink Switch System, the interconnect that enables the GPUs in the system to work together as one. The previous generation system maxed out at eight GPUs working in tandem.

To get to the full-sized system still requires significant data-center real estate. Each 15 rack-unit chassis holds eight compute nodes, and there are two chassis per rack (or pod in Nvidia parlance) along with NVswitch ethernet and IP connectivity. Up to eight of the pods can be linked for up to 256 processors.

The system is air cooled despite the fact that Hopper GPUs draw 700 Watts of power, which means considerable heat. Nvidia said that it is internally developing liquid-cooled systems and is talking about it with customers and partners, but for now the DGX GH200 is cooled by fans.

So far, potental users of the system aren’t ready for liquid cooling, said Charlie Boyle, vice president of DGX systems at Nvidia. “There will be points in the future where we’ll have designs that have to be liquid cooled, but we were able to keep this one on air,” he said.

Nvidia announced at Computex that the Grace Hopper Superchip is in full production. Systems from OEM partners are expected to be delivered later this year.

by Andy Patrizio

Andy Patrizio is a freelance journalist based in southern California who has covered the computer industry for 20 years and has built every x86 PC he’s ever owned, laptops not included.

The opinions expressed in this blog are those of the author and do not necessarily represent those of ITworld, Network World, its parent, subsidiary or affiliated companies.

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

By Howard Solomon

Feb 14, 20253 mins

FirewallsVulnerabilitiesZero-day vulnerability

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

By Zeus Kerravala

Feb 14, 20256 mins

Networking

Americas

Topics

About

Policies

Our Network

More

Inside Nvidia’s new AI supercomputer

Nvidia's Grace Hopper CPU/GPU combo underpins its supercomputer the company claims can crank out nearly an exaFLOP of AI performance.

Software is included.

More from this author

Nvidia partners with cybersecurity vendors for real-time monitoring

FPGAs lose luster in genAI era

Nvidia claims near 50% boost in AI storage speed

Taiwan chip tariff would raise industry costs, analysts say

Verizon brings AI suite to enterprise infrastructure customers

More questions than answers around Trump’s Stargate AI plans

What Intel needs to do to get its mojo back

Oracle updates Exadata systems to speed database operations

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

Juniper CEO: 'I am disappointed and somewhat puzzled' by DOJ merger rejection

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command

Inside Nvidia’s new AI supercomputer

Nvidia's Grace Hopper CPU/GPU combo underpins its supercomputer the company claims can crank out nearly an exaFLOP of AI performance.

Software is included.

From our editors straight to your inbox

More from this author

Nvidia partners with cybersecurity vendors for real-time monitoring

FPGAs lose luster in genAI era

Nvidia claims near 50% boost in AI storage speed

Taiwan chip tariff would raise industry costs, analysts say

Verizon brings AI suite to enterprise infrastructure customers

More questions than answers around Trump’s Stargate AI plans

What Intel needs to do to get its mojo back

Oracle updates Exadata systems to speed database operations

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

Juniper CEO: 'I am disappointed and somewhat puzzled' by DOJ merger rejection

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command