Home Networking Arista lays out AI networking plans

by Michael Cooney

Senior Editor

Arista lays out AI networking plans

News

Apr 09, 20245 mins

Networking

Arista’s Etherlink technology will be supported across a range of products, including 800G systems and line cards, and will be compatible with specifications from the Ultra Ethernet Consortium.

Credit: Timofeev Vladimir / Shutterstock

Arista Networks has offered a look at how it expects to roll out Ethernet technology that will underpin the networks required to handle the demands of AI-based workloads.

The new Arista Etherlink platform will include a broad range of 800G systems and line cards based on the company’s EOS operating system – which ultimately will include supercharged Ethernet features compatible with specifications from the Ultra Ethernet Consortium (UEC), according to Arista CEO Jayshree Ullal, who authored a recent blog post. “As the UEC completes its extensions to improve Ethernet for AI workloads, Arista assures customers that we can offer UEC-compatible products, easily upgradable to the standards as UEC firms up in 2025,” Ullal wrote.

The UEC was founded last year by AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta and Microsoft, and it now includes more than 50 vendors. The consortium is developing technologies aimed at increasing the scale, stability, and reliability of Ethernet networks to satisfy AI’s high-performance networking requirements. Later this year, it plans to release official specifications that will focus on a variety of scalable Ethernet improvements, including better multi-path and packet delivery options as well as modern congestion and telemetry features.

Across the Arista Etherlink portfolio, UEC-compatible features would include dynamic load balancing, congestion control, and reliable packet delivery, Ullal stated.

“AI workloads push the ‘collective’ operation, where allreduce and all-to-all are the dominant collective types. Today’s models are already moving from billions to one trillion parameters with GPT-4. Of course, we have others such as Google Gemini, open source Llama and xAI’s Grok,” Ullal wrote. “During the compute-exchange-reduce cycle, the volume of data exchanged is so significant that any slowdown due to a poor network can critically impact the AI application performance. The Arista Etherlink AI topology will allow every flow to simultaneously access all paths to the destination with dynamic load balancing at multi-terabit speeds.”

“Arista Etherlink supports a radix from 1,000 to 100,000 GPU nodes today, which will go to more than one million GPUs in the future,” Ullal added.

According to Ullal, two additional key features of Arista’s Etherlink platforms are:

Predictable latency: “Rapid and reliable bulk transfer from source to destination is key to all AI job completion. Per-packet latency is important, but the AI workload is most dependent on the timely completion of an entire processing step. In other words, the latency of the whole message is critical. Flexible ordering mechanisms use all Etherlink paths from the NIC to the switch to guarantee end-to-end predictable communication.”
Congestion management: “Managing AI network congestion is a common ‘incast’ problem. It can occur on the last link of the AI receiver when multiple uncoordinated senders simultaneously send traffic to it. To avoid hotspots or flow collisions across expensive GPU clusters, algorithms are being defined to throttle, notify, and evenly spread the load across multipaths, improving the utilization and TCO of these expensive GPUs with a VoQ fabric,” Ullal wrote. The Arista Virtual Output Queuing (VoQ) fabric features a distributed scheduling mechanism that guarantees traffic flow delivery in congested switch ports.

Arista AI networking also depends on a combination of the vendor’s core EOS operating system and its natural-language, generative AI-based Autonomous Virtual Assist (AVA) system for delivering network insights, Ullal wrote.

“Arista AVA imitates human expertise at cloud scale through an AI-based expert system that automates complex tasks like troubleshooting, root cause analysis, and securing from cyber threats,” Ullal wrote. “It starts with real-time, ground-truth data about the network devices’ state and, if required, the raw packets. AVA combines our vast expertise in networking with an ensemble of AI/ML techniques, including supervised and unsupervised ML and NLP (Natural Language Processing). Applying AVA to AI networking increases the fidelity and security of the network with autonomous network detection and response and real-time observability.”

Regarding Arista’s EOS software stack, Ullal said it can help customers build resilient AI clusters. “EOS offers improved load balancing algorithms and hashing mechanisms that map traffic from ingress host ports to the uplinks so that flows are automatically re-balanced when a link fails,” Ullel wrote. “Our customers can now pick and choose packet header fields for better entropy and efficient load-balancing of AI workloads.

AI network visibility is another critical aspect in the training phase for large datasets used to improve the accuracy of LLMs, according to Ullal. “In addition to the EOS-based Latency Analyzer that monitors buffer utilization, Arista’s AI Analyzer monitors and reports traffic counters at microsecond-level windows. This is instrumental in detecting and addressing microbursts which are difficult to catch at intervals of seconds,” Ullal wrote.

In general, AI training clusters require a fundamentally new approach to building networks, “given the massively parallelized workloads” that can cause congestion, according to Ullal. “Traffic congestion in any single flow can lead to a ripple effect slowing down the entire AI cluster, as the workload must wait for that delayed transmission to complete. AI clusters must be architected with massive capacity to accommodate these traffic patterns from distributed GPUs, with deterministic latency and lossless deep buffer fabrics designed to eliminate unwanted congestion,” she wrote.

by Michael Cooney

Senior Editor

Michael Cooney is a Senior Editor with Network World who has written about the IT world for more than 25 years. He can be reached at michael_cooney@foundryco.com.

Americas

Topics

About

Policies

Our Network

More

Arista lays out AI networking plans

Arista’s Etherlink technology will be supported across a range of products, including 800G systems and line cards, and will be compatible with specifications from the Ultra Ethernet Consortium.

More from this author

Cisco adds AI features to AppDynamics On-Premises

IBM Power server targets AI workloads at the edge

HPE Aruba looks to fight AI threats with AI weapons

AI features boost Cisco’s Panoptica application security software

Most popular authors

Show me more

Frontier retains top spot among world's fastest supercomputers

Nvidia teases quantum accelerated supercomputers

CHIPS Act to fund $285 million for semiconductor digital twins

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

Has the hype around ‘Internet of Things’ paid off?

Are unused IPv4 addresses a secret gold mine?

Preparing for a 6G wireless world: Exciting changes coming to the wireless industry

Arista lays out AI networking plans

Arista’s Etherlink technology will be supported across a range of products, including 800G systems and line cards, and will be compatible with specifications from the Ultra Ethernet Consortium.

Related content

Compressing files using the zip command on Linux

High-bandwidth memory nearly sold out until 2026

NSA, FBI warn of email spoofing threat

Download our SASE and SSE enterprise buyer’s guide

Newsletter Promo Module Test

More from this author

Cisco adds AI features to AppDynamics On-Premises

IBM Power server targets AI workloads at the edge

HPE Aruba looks to fight AI threats with AI weapons

AI features boost Cisco’s Panoptica application security software

Most popular authors

Show me more

Frontier retains top spot among world's fastest supercomputers

Nvidia teases quantum accelerated supercomputers

CHIPS Act to fund $285 million for semiconductor digital twins

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

Has the hype around ‘Internet of Things’ paid off?

Are unused IPv4 addresses a secret gold mine?

Preparing for a 6G wireless world: Exciting changes coming to the wireless industry