Network World

Palo Alto Networks firewall bug being exploited by threat actors: Report

Fri, 14 Feb 2025 19:30:00 +0000

Admins with firewalls from Palo Alto Networks should make sure the devices are fully patched and the management interface blocked from open internet access after the discovery this week of a zero-day login authentication bypass in the PAN-OS operating system.

The discovery of the vulnerability (CVE-2025-0108) was made by researchers at Assetnote and, according to researchers at Greynoise, is already being exploited.

For its part, Palo Alto Networks (PAN) said administrators can “greatly reduce the risk” of exploitation by restricting access to the management web interface to only trusted internal IP addresses, according to its recommended best practices deployment guidelines. “This will ensure that attacks can succeed only if they obtain privileged access through those specified IP addresses,” the company said.

Security experts regularly warn network admins and infosec pros about the dangers of exposing device management interfaces to the open internet. One way to protect them is by accessing them via a virtual private network (VPN), while another is by restricting access to only internal IP addresses.

Finding at-risk devices

To find any assets that require remediation, PAN says admins should go to the Assets section of its Customer Support Portal and look for the Remediation Required section. A list of the devices that have an internet-facing management interface and tagged with ‘PAN-SA-2024-0015’ will be displayed. If no devices are listed, then none have an internet-facing management interface.

Note that if a management profile on interfaces with GlobalProtect portals or gateways has been configured, a PAN device is exposed through the management web interface, which is typically accessible on port 4443.

The issue doesn’t affect the company’s Cloud NGFW or Prisma Access software.

Greynoise said exploitation began around Tuesday of this week. Assetnote published research about the hole on Wednesday. Palo Alto Networks published its advisory the same day.

‘Weird path-processing behavior’

The vulnerability, Assetnote said, is a “weird path-processing behavior” in the Apache HTTP server part of PAN-OS, which, along with Nginx, handles web requests to access the PAN-OS management interface. The web request first hits the Nginx reverse proxy, and if it is on a port that indicates it’s destined for the management interface, PAN-OS sets several headers; the most important of them is X-pan AuthCheck. The Nginx configuration then goes through several location checks and selectively sets the auth check to off. The request is then proxied to Apache, which will re-normalize and re-process the request as well as apply a rewrite rule under certain conditions. If the file requested is a PHP file, Apache will then pass through the request via mod_php FCGI, which enforces authentication based upon the header.

The problem is that Apache may process the path or headers differently to Nginx before the access request is handed to PHP, so if there is a difference between what Nginx thinks a request looks like and what Apache thinks it looks like, an attacker could achieve an authentication bypass.

Assetnote describes this as a “quite common” architecture problem where authentication is enforced at a proxy layer, but then the request is passed through a second layer with different behavior. “Fundamentally,” the research note added, “these architectures lead to header smuggling and path confusion, which can result in many impactful bugs.”

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery and more

Fri, 14 Feb 2025 17:20:40 +0000

Healthcare has the most significant upside, particularly in patient care, among all the industries applicable to AI. The healthcare industry creates massive amounts of data, but much of it lives in silos. Also, because of the specialization of healthcare services, connecting the dots between the data points is nearly impossible.

I recently talked with a doctor at Stanford Healthcare about this, and he believes AI will play a significant role in diagnosing uncommon issues. He explained that a cardiac doctor would often run tests to see if something is heart related, and then a GI doctor might run a similar test to investigate stomach concerns. He then further elaborated that there is no mechanism to connect the dots between the tests that the various doctors run, but he’s hopeful AI will play that role.

AI can improve every aspect of healthcare and life sciences. AI can analyze data to diagnose diseases earlier and discover new treatments faster. For example, I recently talked to a data scientist at a Northeast hospital where Nvidia DGX’s are used to analyze MRIs. Before AI, doctors would spend a significant amount of time, up to 80%, studying MRIs and only about 20% of their time treating patients. With AI, that number has flipped, and doctors can now spend most of their time treating patients as if the AI were looking at MRIs. The other benefit is AI can “see” things a human doctor can’t. The use of AI can spot the most abnormalities in an MRI.

Given the number of specialty companies in the industry, solving healthcare challenges with AI requires an ecosystem approach. At the recent J.P. Morgan Healthcare Conference in San Francisco, Nvidia announced several new AI-centric partnerships with industry leaders to advance genomics, drug discovery, and advanced healthcare services.

In an analyst briefing, Kimberly Powell, vice president of healthcare at Nvidia, described the “AI revolution” in healthcare, enabling new product creation. “Instead of humans writing code, you now have LLMs (large language models). You introduce data, and the machine learns from that data. It builds models and then writes software that executes on a GPU. We call that the AI factory.”

Nvidia has been evangelizing the concept of the AI factory over the past several months. It was prominent in CEO Jensen Huang’s keynote at CES and at the National Retail Federation’s annual conference during a panel with Azita Martin, vice president and general manager of retail and CPG. The idea behind the AI factory is to shift computing to meet AI’s data-heavy demands. Powell explained how, with AI, data is the raw material, and tokens are the commodity, with the tokens being generated in the factory.

Through its Inception program, Nvidia already has more than a thousand digital healthcare startups in its industry ecosystem. Those startups are developing thousands of AI agents using Nvidia AI Enterprise, which provides creators with the necessary building blocks—from pre-trained models to state-of-the-art retrieval, augmented generation, and guardrails for agentic AI.

Here are the highlights of Nvidia’s latest healthcare partnerships announced at the event.

IQVIA, a Durham, N.C.-based global provider of clinical research services, commercial insights, and healthcare intelligence to the life sciences and healthcare industries, is leveraging Nvidia’s AI foundry and factory to accelerate the development and deployment of AI agents for its more than 10,000 healthcare customers. The partnership will accelerate trial execution while reducing administrative burdens by deploying AI agents to transform complex workflows and turn IQVIA’s database into a knowledge base.
Nvidia is collaborating with Arc Institute, a nonprofit research organization in Palo Alto dedicated to accelerating scientific progress and understanding the root cause of disease. The partnership will focus on “developing true foundation models for biology using Nvidia BioNeMo frameworks and Nvidia DGX Cloud. The resulting work will be shared and contributed to the open-source community in BioNeMo, democratizing large-scale biomedical research.
A partnership with Illumina will combine Illumina’s next-generation sequencing technologies and connected analytics platform with Nvidia’s Clara AI healthcare suite to develop and deploy foundation models that unlock genomics insights, expanding the opportunities in that field. Illumina will integrate Nvidia’s computing libraries, several APIs, and other capabilities to target new markets for genomics through insights—not just the data.
Collaboration with Mayo Clinic is designed to accelerate the development of next-generation pathology foundation models to push the frontiers in mental health experiences and predictive and efficient treatment strategies. The Mayo Clinic will deploy the Nvidia’s DGX B200 platform featuring 1.4 terabytes of GPU memory. This type of system is ideal for handling large digital healthcare and digital pathology whole slide images. These systems will integrate Nvidia’s Project MONAI AI toolkit of open-source frameworks with Mayo Clinic’s digital platform. Powell said the goal is to “create a human digital twin. Just a dynamic digital representation, including medical imaging, pathology, health records, and wearables.” Using a digital twin can enable doctors to run “what if scenarios” for treatment to better understand outcomes and risks.

Nvidia’s focus on industries will help accelerate adoption as it shows the art of the possible. In January, the company rolled out Nvidia AI Blueprint for retail shopping assistants at the NRF show. When I stopped by the Nvidia station in the Dell booth, there was a constant stream of major brands looking to learn how to use AI in their environments.

These partnerships in healthcare announcements will have a similar impact in that industry, as Nvidia, with its ecosystem partners, can showcase practical use cases. Healthcare is often regarded as slow-moving and lacking in innovation, and AI can help change that.

Juniper CEO: ‘I am disappointed and somewhat puzzled’ by DOJ merger rejection

Fri, 14 Feb 2025 14:30:32 +0000

Few would have predicted the U.S. Department of Justice would move to block HPE’s proposed acquisition of Juniper Networks. When it happened late last month, the two vendors and many industry watchers were caught off guard.

“In the thirteen months since HPE announced its proposed acquisition of Juniper, the European Commission, UK Competition and Markets Authority, and eleven other antitrust authorities around the world cleared this transaction without as much as a second look,” HPE and Juniper wrote in a response to the DOJ rejection.

“It was, therefore, no surprise that after a year of in-depth investigation requiring the production of millions of documents, voluminous submissions of data and other information, and sworn testimony, HPE and Juniper—along with virtually every networking industry expert, customers, partners, competitors, and antitrust pundits—expected the DOJ to reach the same conclusion and clear the proposed transaction in short order. Indeed, there was no reason to believe that the DOJ would seek to enjoin this acquisition for a simple reason: enjoining HPE from acquiring Juniper will provide no benefit to customers, businesses, competition, or the national security interest of the United States.”

Yet on January 30, the DOJ did the unexpected and sued to block the $14 billion sale of Juniper to HPE, citing reduced competition in the wireless market.

“I am disappointed and somewhat puzzled by the position that the DOJ has taken on this deal. It’s not something that we expected. It’s not something that, I think, makes a lot of sense,” Juniper CEO Rami Rahim said to Network World.

“They’re taking such a narrow view of the total transaction, which is the wireless line segment, a relatively small part of Juniper’s business, a small part of HPE’s business. And even if you do take a look at the wireless segment, you know we’re talking about a very competitive area with eight or nine different competitors. It’s unfortunate that we’re in the situation that we’re in, but that said, that’s okay. We’re prepared to take it to court and to prove our case and ultimately, hopefully, prevail,” Rahim said.

HPE and Juniper met with the DOJ several times to go over the purchase, but the companies had no inclination the DOJ would go the direction it did—certainly with regards to its focus on the wireless market, Rahim said.

The DOJ issued a Complaint “that ignores the reality that HPE and Juniper are two of at least ten competitors with comparable offerings and capabilities fighting to win customers every day,” the companies wrote. “A Complaint whose description of competitive dynamics in the wireless local area networking (WLAN) space is divorced from reality; and a Complaint that contradicts the conclusions reached by antitrust regulators around the world that have unconditionally cleared the transaction.”

“As part of this process, the DOJ requests large volumes of information, documents, and so forth, which we have worked very constructively with them, including meetings that I myself participated in. But no, now, at this point, we’re just preparing to go to court,” Rahim said.

“The DOJ Complaint ignores both the extensive benefits that will result from the proposed acquisition and the nature and extent of competition in the wireless networking space,” the companies continued. “It is a Complaint that will reinforce the status quo by benefiting Cisco, which has dominated wireless networking for decades (and whose dominance was threatened by the acquisition); a Complaint that will hobble competition with Huawei—which has been repeatedly identified as a national security risk by the U.S. government—and thus damage the U.S.’s stated aim of reducing the use of Chinese technology in critical infrastructure globally,” the companies wrote in response to the court.

“If the government’s lawsuit were to succeed, the true beneficiary would not be the customers, who will lose the benefits of a combined company, but Cisco, which will continue to have the scale needed to preserve its controlling share,” the companies wrote.

Industry analysts question DOJ’s WLAN argument

Many industry watchers might have thought the higher end of the networking spectrum—say, large enterprise switches, routers, and other gear—would be a potential competitive issue for the DOJ. Concern about WLAN competition was not expected.

“Apparently, eight companies with greater than $18M each of WLAN revenue in a $4B North American market is too few,” wrote Siân Morgan, Dell’Oro Group research director, in a blog about the ruling.

“In actual fact, the WLAN market is highly fragmented, with one large elephant in the room: Cisco. By developing a highly sophisticated global channel, a comprehensive suite of high-end networking products, and by means of a relentless sequence of acquisitions, Cisco is first—by a long stretch—in four out of five of the Enterprise Network segments we track. In the fifth segment, Network Security, Cisco is number two,” Morgan wrote.

In the wireless LAN market—the market the Justice Department decided to single out—Cisco’s market share in North America has hovered above the 50% mark for the past 10 years, without much notable change, she stated. “During that period, HPE has remained a distant number two. Juniper, by acquiring Mist in 2018 has eked out steady WLAN gains, stealing market share—not just from Cisco or HPE—but also from other WLAN vendors. Despite growth rates that are consistently higher than the market, Juniper’s North American market share for the first three quarters of 2024 has only reached 8%,” Morgan wrote.

“Apparently, the Justice Department feels that Cisco is too big to allow HPE to acquire another WLAN vendor in order to compete with Cisco. If the courts buy into this shampoo bottle logic (“lather, rinse, repeat, lather, rinse, repeat…”), the WLAN market in North America is likely to be trapped in the status quo for several years to come,” Morgan concluded.

Still, others believe the enterprise customer may be the biggest loser should this deal be permanently blocked.

“HPE might well be a threat to Cisco’s account control even without Juniper. They already have the Aruba line, and their strategic influence among enterprise buyers is better than Cisco’s,” wrote Tom Nolle, principal analyst at Andover Intel, in his blog post. (Nolle is also a columnist with Network World). “For decades, my involvement with enterprises has shown that data center networking drives enterprise networking, and data center technology drives data center networking. Cisco has servers (UCS), but they don’t have a significant position in the enterprise data center market. HPE does.”

“So, from all of this, it would seem that blocking the merger helps Cisco. If that’s the case, given that Cisco is the incumbent, it could also hurt competition if the DOJ succeeds,” Nolle wrote.

Juniper CEO notes company health, momentum

Juniper, Rahim noted, is performing well on its own for now.

“We are executing incredibly well as a standalone company. We have done so over the last year, and I expect that we will continue to do so through this year.”

In the company’s recently wrapped Q4 2024, Juniper grew orders more than 40% year over year. “That’s on the heels of a Q3 where we grew orders at around 60% year over year,” Rahim said. “We saw double-digit order growth in all theaters, all customer solutions, all verticals. In fact, we saw triple-digit growth in our cloud vertical driven by AI data center buildouts.”

There are two areas in the networking arena that Rahim believes make Juniper strong. The first is AI for networks—using AI to run networks better, more efficiently, and more easily for network operators, Rahim said.

“The second opportunity, which is equally important, is the networks that essentially make up the AI data center and the networks that connect AI data centers together. Our innovation engine here is remarkable. [Juniper is] the first to introduce 800-gigabit networking, both in switching and in routing, to keep up with the unprecedented traffic growth rates,” Rahim said.

“Increasingly, it’s about building these unique capabilities that you need in the entire stack, from the hardware to the software, that essentially optimize the network for AI workloads. So, that means understanding traffic patterns, understanding where there might be congestion, and then taking proactive actions to avoid that congestion, and that’s where our AI engine technology in Marvis plays,” Rahim said. “Congestion in an AI data center is like evil, because ultimately it results in less efficient usage of your most expensive commodities, which are the GPUs. So if we can alleviate congestion, we can make sure that cloud providers, enterprises, service providers that are building these clusters are leveraging that precious investments in GPUs as efficiently and as effectively as possible.”

“Having said all that, I do believe that as strong as we are as a standalone company, we can be an even stronger company and compete with a broader solution portfolio on an international scale. That’s why I’m so excited about this opportunity.”

Arm secures Meta as first customer in chip push, challenging industry giants

Fri, 14 Feb 2025 13:00:10 +0000

In a landmark shift, Arm has secured Meta as the first major customer for its internally designed server CPUs, a move that signals its entry into direct chip sales and places it in direct competition with its biggest customers, including Qualcomm and Nvidia.

The company, known for licensing its chip designs to industry heavyweights like Apple, Nvidia, and Qualcomm, is now stepping directly into the silicon market, a move that could put it in direct competition with the very customers it once served.

Some Palo Alto Networks firewalls are spontaneously rebooting

Fri, 14 Feb 2025 01:38:22 +0000

Palo Alto Networks has acknowledged that some of its next-generation firewalls running the PAN-OS operating system are suddenly rebooting.

“There was an issue on certain older versions of PAN-OS where the system could crash when handling very specific traffic conditions,” the company told Network World. “This was resolved in hotfix 11.1.4-h12,” which was released with limited availability on Jan. 31.

Affected devices are running version 11.1.4-h7/h9 of the operating system.

“We are currently validating an additional unrelated regression fix in hotfix 11.1.4-h13. Our goal is to release this as a generally available (GA) update by Feb. 20 or sooner. This will ensure all systems are fully optimized and secure with the latest updates,” the statement said.

A “very small number of our customers have experienced this issue,” it added. “If a customer believes they are impacted, they should reach out to our support organization for guidance on the hotfix.”

The company didn’t reply to questions on when the problem was discovered and why there is only limited availability for hotfix 11.1.4-h12.

As for what caused the problem, the company said that won’t be detailed “for security reasons.”

“We encourage customers with specific concerns related to their environments to reach out to our support teams, who are fully prepared to assist,” the statement said.

The issue came to light after some Palo Alto Networks customers posted complaints this week on a Reddit forum. “We had 3 of our 8 firewalls unexpectedly reboot in the past few months,” wrote one person. Another Reddit poster said the issue only happens if the firewall is set to do SSL interception..

“I guess it makes it more difficult for the bad guys to exploit the numerous vulnerabilities if the device keeps rebooting,” said Johannes Ullrich, dean of research at the SANS Institute.

He suspects this is a bug and not something caused by a specific cyber attack. “Firewalls may reboot if they run low in system resources like memory or are hit with a specific packet that triggers a denial of service condition,” he wrote in an email. “Yes, it is possible that a more severe vulnerability, if exploited not quite correctly, causes this, but I would guess at this point that this is not a specific attack.”

Separately, last month researchers at Eclypsium reported that next-generation firewalls they examined from Palo Alto Networks contain years-old known vulnerabilities in their UEFI firmware. UEFI includes the low-level code responsible for initializing a computer’s hardware before loading the operating system installed on the hard drive.

The discovered issues also included insecure configurations that have been known for years, and that could be exploited by attackers with root access on the devices to implant malicious code into the low-level firmware or bootloader.

Juniper unveils EX4000 access switches to simplify enterprise network operations

Thu, 13 Feb 2025 15:56:40 +0000

Juniper has expanded its access layer options with a new switch family it says will help customers use automation and AI-based support to streamline enterprise operations.

The Juniper EX4000 line of access switches is aimed at customers with enterprise branch, remote office and campus networks who want to upgrade their systems with more advanced features than the vendor’s current line of EX2000/3000 Series boxes.

There are 10 models of the EX4000, including 8-, 12-, 24-, and 48-port models offering 2 x 1GbE/10GbE small form-factor pluggable plus transceiver (fixed uplink ports). Each EX4000 12-, 24-, and 48-port model also offers an additional 2 x 1GbE/10GbE SFP+ ports to support Virtual Chassis connections, which can be reconfigured for use as network ports, according to Juniper. All models support IEEE 802.3bz, which defines Ethernet speeds of 2.5GbE and 5GbE over twisted pair copper cabling.

The EX4000 offers Power over Ethernet++ (POE++) support to all connected devices, including phones, surveillance cameras, IoT devices, and 802.11AX/Wi-Fi 6/Wi-Fi 7 access points. POE++ lets devices remain connected even if a local server is being rebooted. It offers a PoE power budget of up to 960W on the EX4000-48MP model, with up to 60W per port, according to Juniper.

Central to the EX4000 family is its integrated support for Juniper’s AI-Native Networking Platform and the vendor’s cloud-based, natural-language Mist AI and Marvis virtual network assistant (VNA) technology. The AI platform works by gathering telemetry and user state data from Juniper’s routers, switches, access points, firewalls, and applications to offer actionable insights and automated workflows for proactive issue detection and resolution, Juniper says

Juniper’s Mist AI engine analyzes data from myriad networked access points and devices so it can detect, offer actionable resolutions, and fix anomalies and problems.

In particular, the switch’s operating system, Junos OS, features AIOps support to drive automation and simplify configuration and management, according to a blog post by Sanjoy Dey, vice president of product management for Juniper’s campus and branch portfolio.

Relative to campus and branch deployments, for example, Junos OS lets enterprises uses AIOps capabilities to deploy entire network sites effortlessly, easily onboard new locations, and upgrade infrastructure without operational disruptions, Dey stated.

Juniper’s AI-Native operations with Wired Assurance enables proactive troubleshooting that saves IT teams valuable time, creates significant cost savings, and identifies and fixes problems before they impact user experiences, Dey stated. Wired Assurance is Juniper’s cloud-based management platform for managing enterprise campus wired networks and switches.

In addition, the Juniper Mist Access Assurance security package can be used with the EX4000s to provide “always-on” identification, authentication and authorization for every device at every point of connection, verifying that only trusted users and devices access the network, Dey wrote.

Cisco financials catch AI demand, enterprise networking growth

Thu, 13 Feb 2025 15:45:57 +0000

Cisco executives were upbeat about the growth in AI orders — $700 million so far this year, on way to surpass $1 billion in 2025 – as the networking giant detailed a successful 2Q earnings period this week to Wall Street.

“Our AI infrastructure orders with webscalers in Q2 surpassed $350 million, bringing our year-to-date total to approximately $700 million, and we are on track to exceed $1 billion of AI infrastructure orders in fiscal year ’25,” Cisco CEO Chuck Robbins said during the vendor’s financial call. “I would say what we’re seeing on the enterprise side relative to AI is, it’s still in the very early days, and they all realize they need to figure out exactly what their use cases are, [but] we’re starting to see some spending though on specific AI-driven infrastructure.”

[ Related: More Cisco news and insights ]

“We do think there is an acceleration of companies that are just trying to get prepared for what they’re going to need,” Robbins said. “It’s clear that agentic AI work streams are going to put more capacity onto the network.”

Robbins pointed to recent Cisco product additions leading the first wave of AI networking and security products that will drive future demand. Those products include most recently the Cisco 9300 Smart Switch family aimed at data center customers to facilitate secure AI development across the enterprise. Other AI products include AI Defense, a service that promises to protect enterprise AI development projects; AI Pods which offer a UCS rack server built on the Nvidia HGX platform and introduced plug-and-play infrastructure stacks tailored for specific AI use cases; Hypershield, an AI-based self-upgrading security fabric that’s designed to protect distributed applications, devices and data; and the Cisco Nexus HyperFabric AI cluster.

Robbins said Cisco’s AI products are part of a three-pronged strategy to address the AI opportunity.

“First, AI training infrastructure for webscale customers. Combinations of our Cisco 8K, Silicon One, optics and optical systems are being deployed by five of the largest webscalers in their back-end training networks,” Robbins said.

Second, AI inference and enterprise clouds. “Our Nexus switches, Nvidia-based AI servers, AI Pods, and Hyperfabric and AI Defense software are designed to simplify and de-risk AI infrastructure deployment and bring the power of open, hyperscale AI networking to the enterprise,” Robbins said.

And third, AI network connectivity. “Customers are leveraging our technology platforms across switching, routing, security and observability to modernize, secure, and automate their network operations to prepare for pervasive deployment of AI applications,” Robbins said. “This, combined with mature back-end models will lead to increased capacity requirements from both private and public front-end cloud networks,” Robbins said.

While AI networking is just at the beginning, enterprise networking is Cisco’s bread-and-butter and at least in the second quarter of 2025, results are positive.

“Networking product orders grew double-digits driven by switching, enterprise routing, webscale infrastructure, and industrial networking applications in our IoT products,” Robbins said.

“Campus switching orders were up double digits and we expect our campus switching portfolio, as well as our WiFi 7 access points, to gain traction with increasing return-to-office policies,” Robbins said.

“We also continue to see robust order growth for data center switching, this being our fourth consecutive quarter of double-digit growth,” Robbins said. “We expect this to continue as our 800G Nexus switches based on our 51.2 terabit Silicon One chip become available in April for AI cloud buildouts.”

Robbins also noted strong demand for the vendor’s Industrial Internet-of-Things product line that includes its ruggedized Catalyst networking products.

“In the first half of fiscal year ’25, orders grew more than 40% and, in Q2, we saw growth of more than 50%, signaling an acceleration as customers prepare for the deployment of AI-powered robotics and industrial security,” Robbins said.

What is SASE? How the cloud marries networking and security

Thu, 13 Feb 2025 10:48:00 +0000

Secure Access Service Edge (SASE) is a network architecture that combines software-defined wide area networking (SD-WAN) and security functionality into a unified cloud service that promises simplified WAN deployments, improved efficiency and security, and application-specific bandwidth policies.

First outlined by Gartner in 2019, SASE (pronounced “sassy”) has quickly evolved from a niche, security-first SD-WAN alternative into a popular WAN sector that analysts project will grow to become a $10-billion-plus market within the next couple of years.

[ Download our editors’ PDF SASE and SSE enterprise buyer’s guide today! ]

Market research firm Dell’Oro group forecasts that the SASE market will triple by 2026, topping $13 billion. Gartner is more bullish, predicting that the market will grow at a 36% CAGR between 2020 and 2025 to reach $14.7 billion by 2025.

5 facts you need to know about SASE

SASE is a convergence of networking and security:
SASE is an architectural approach that combines SD-WAN capabilities (e.g., routing, optimization, and bandwidth management) with security functions (e.g., firewall as a service, secure web gateway, cloud access security broker, and zero trust network access) into a single cloud-delivered service.
SASE is cloud-centric:
SASE is primarily delivered as a cloud service. This means that both networking and security functions are hosted and managed in the cloud, rather than on-premises. This offers several benefits, including scalability, flexibility, and reduced hardware costs.
SASE emphasizes zero trust:
Zero trust network access (ZTNA) is a core component of SASE.It’s a security model based on the principle of “never trust, always verify.” ZTNA requires verification of every user and device before granting access to any resource, regardless of location.
SASE improves user experience:
By optimizing network traffic and providing secure access to applications and resources regardless of location, SASE can significantly improve user experience. This is especially important for remote and mobile workers who need seamless access to cloud-based applications.
SASE simplifies IT management: Migrating to a SASE architecture can be complex, but ultimately simplifies IT management by consolidating multiple networking and security functions into a single platform. This reduces the number of vendors IT teams need to manage and streamlines policy enforcement.

What is SASE?

SASE consolidates SD-WAN with a suite of security services to help organizations safely accommodate an expanding edge that includes branch offices, public clouds, remote workers and IoT networks.

While some SASE vendors offer hardware appliances to connect edge users and devices to nearby points of presence (PoPs), most vendors handle the connections through software clients or virtual appliances. SASE is typically consumed as a single service, but there are a number of moving parts, so some offerings piece together services from various partners.

[ Related: Networking terms and definitions ]

On the networking side, the key features of SASE are WAN optimization, content delivery network (CDN), caching, SD-WAN, SaaS acceleration, and bandwidth aggregation. The vendors that make the WAN side of SASE work include SD-WAN providers, carriers, content-delivery networks, network-as-a-service (NaaS) providers, bandwidth aggregators and networking equipment vendors.

The security features of SASE can include encryption, multifactor authentication, threat protection, data leak prevention (DLP), DNS, Firewall-as-a-Service (FWaaS), Secure Web Gateway (SWG), and Zero Trust Network Access (ZTNA). The security side relies on a range of providers, including cloud-access security brokers, cloud secure web gateways providers, zero-trust network access providers, and more.

The feature set will vary from vendor to vendor, and the top vendors are investing in advanced capabilities, such as support for 5G for WAN links, advanced behavior- and context-based security capabilities, and integrated AIOps for troubleshooting and automatic remediation.

Ideally, all these capabilities are offered as a unified SASE service by a single service provider, even if certain components are white labeled from other providers.

What are the benefits of SASE?

Because it is billed as a unified service, SASE promises to cut complexity and cost. Enterprises deal with fewer vendors, the amount of hardware required in branch offices and other remote locations declines, and the number agents on end-user devices also decreases.

SASE removes management burdens from IT’s plate, while also offering centralized control for things that must remain in-house, such as setting user policies. IT executives can set policies centrally via cloud-based management platforms, and the policies are enforced at distributed PoPs close to end users. Thus, end users receive the same access experience regardless of what resources they need, and where they and the resources are located.

SASE also simplifies the authentication process by applying appropriate policies for whatever resources the user seeks, based on the initial sign-in. It also supports zero-trust networking, which controls access based on user, device and application, not location and IP address.

Security is increased because policies are enforced equally regardless of where users are located. As new threats arise, the service provider addresses how to protect against them, with no new hardware requirements for the enterprise.

More types of end users – employees, partners, contractors, customers – can gain access without the risk that traditional security – such as VPNs and DMZs – might be compromised and become a beachhead for potential attacks on the enterprise.

SASE providers can supply varying qualities of service, so each application gets the bandwidth and network responsiveness it needs. With SASE, enterprise IT staff have fewer chores related to deployment, monitoring and maintenance, and can be assigned higher-level tasks.

What are the SASE challenges?

Organizations thinking about deploying SASE need to address several potential challenges. For starters, some features could come up short initially because they are implemented by providers with backgrounds in either networking or security, but might lack expertise in the area that is not their strength.

Another issue to consider is whether the convenience of an all-in-one service meets the organization’s needs better than a collection of best-in-breed tools.

SASE offerings from a vendor with a history of selling on-premises hardware may not be designed with a cloud-native mindset. Similarly, legacy hardware vendors may lack experience with the in-line proxies needed by SASE, so customers may run into unexpected cost and performance problems.

Some traditional vendors may also lack experience in evaluating user contexts, which could limit their ability to enforce context-dependent policies. Due to complexity, providers may have a feature list that they say is well integrated, but which is really a number of disparate services that are poorly stitched together.

Because SASE promises to deliver secure access to the edge, the global footprint of the service provider is important. Building out a global network could prove too costly for some providers. This could lead to uneven performance across locations because some sites may be located far from the nearest PoP, introducing latency.

SASE transitions can also put a strain on personnel. Turf wars could flare up as SASE cuts across networking and security teams. Changing vendors to adopt SASE could also require retraining IT staff to handle the new technology.

What is driving the adoption of SASE?

The key drivers for SASE include supporting hybrid clouds, remote and mobile workers, and IoT devices, as well as finding affordable replacements for expensive technologies like MPLS and IPsec VPNs.

As part of digital transformation efforts, many organizations are seeking to break down tech siloes, eliminate outdated technologies like VPNs, and automate mundane networking and security chores. SASE can help with all of those goals, but you’ll need to make sure vendors share a vision for the future of SASE that aligns with your own.

According to Gartner, there are currently more traditional data-center functions hosted outside the enterprise data center than in it – in IaaS providers clouds, in SaaS applications and cloud storage. The needs of IoT and edge computing will only increase this dependence on cloud-based resources, yet typical WAN security architectures remain tailored to on-premises enterprise data centers.

In a post-COVID, hybrid work economy, this poses a major problem. The traditional WAN model requires that remote users connect via VPNs, with firewalls at each location or on individual devices. Traditional models also force users to authenticate to centralized security that grants access but may also route traffic through that central location.

This model does not scale. Moreover, this legacy architecture was already showing its age before COVID hit, but today its complexity and delay undermine competitiveness.

With SASE, end users and devices can authenticate and gain secure access to all the resources they are authorized to reach, and users are protected by security services located in clouds close to them. Once authenticated, they have direct access to the resources, addressing latency issues.

What is the SASE architecture?

Traditionally, the WAN was comprised of stand-alone infrastructure, often requiring a heavy investment in hardware. SD-WAN didn’t replace this, but rather augmented it, removing non-mission-critical and/or non-time-sensitive traffic from expensive links.

In the short term, SASE might not replace traditional services like MPLS, which will endure for certain types of mission-critical traffic, but on the security side, tools such as IPsec VPNs will likely give way to cloud-delivered alternatives.

Other networking and security functions will be decoupled from underlying infrastructure, creating a WAN that is cloud-first, defined and managed by software, and run over a global network that, ideally, is located near enterprise data centers, branches, devices, and employees.

With SASE, customers can monitor the health of the network and set policies for their specific traffic requirements. Because traffic from the internet first goes through the provider’s network, SASE can detect dangerous traffic and intervene before it reaches the enterprise network. For example, DDoS attacks can be mitigated within the SASE network, saving customers from floods of malicious traffic.

What are the core security features of SASE?

The key security features that SASE provides include:

– Firewall as a Service (FWaaS)

In today’s distributed environment, both users and computing resources are located at the edge of the network. A flexible, cloud-based firewall delivered as a service can protect these edges. This functionality will become increasingly important as edge computing grows and IoT devices get smarter and more powerful.

Delivering FWaaS as part of the SASE platform makes it easier for enterprises to manage the security of their network, set uniform policies, spot anomalies, and quickly make changes.

– Cloud Access Security Broker (CASB)

As corporate systems move away from on-premises to SaaS applications, authentication and access become increasingly important. CASBs are used by enterprises to make sure their security policies are applied consistently even when the services themselves are outside their sphere of control.

With SASE, the same portal employees use to get to their corporate systems is also a portal to all the cloud applications they are allowed to access, including CASB. Traffic doesn’t have to be routed outside the system to a separate CASB service.

– Secure Web Gateway (SWG)

Today, network traffic is rarely limited to a pre-defined perimeter. Modern workloads typically require access to outside resources, but there may be compliance reasons to deny employees access to certain sites. In addition, companies want to block access to phishing sites and botnet command-and-control servers. Even innocuous web sites may be used maliciously by, say, employees trying to exfiltrate sensitive corporate data.

SGWs protect companies from these threats. SASE vendors that offer this capability should be able to inspect encrypted traffic at cloud scale. Bundling SWG in with other network security services improves manageability and allows for a more uniform set of security policies.

– Zero Trust Network Access (ZTNA)

Zero Trust Network Access provides enterprises with granular visibility and control of users and systems accessing corporate applications and services.

A core element of ZTNA is that security is based on identity, rather than, say, IP address. This makes it more adaptable for a mobile workforce, but requires additional levels of authentication, such as multi-factor authentication and behavioral analytics.

What other technologies may be part of SASE?

In addition to those four core security capabilities, various vendors offer a range of additional features.

These include web application and API protection, remote browser isolation, DLP, DNS, unified threat protection, and network sandboxes. Two features many enterprises will find attractive are network privacy protection and traffic dispersion, which make it difficult for threat actors to find enterprise assets by tracking their IP addresses or eavesdrop on traffic streams.

Other optional capabilities include Wi-Fi-hotspot protection, support for legacy VPNs, and protection for offline edge-computing devices or systems.

Centralized access to network and security data can allow companies to run holistic behavior analytics and spot threats and anomalies that otherwise wouldn’t be apparent in siloed systems. When these analytics are delivered as a cloud-based service, it will be easier to include updated threat data and other external intelligence.

The ultimate goal of bringing all these technologies together under the SASE umbrella is to give enterprises flexible and consistent security, better performance, and less complexity – all at a lower total cost of ownership.

Enterprises should be able to get the scale they need without having to hire a correspondingly large number of network and security administrators.

Survey the SASE vendor landscape

The SASE market is complex. Vendors include pure-play SASE, SD-WAN vendors expanding into security, security vendors expanding into networking) multivendor SASE, and single-vendor SASE. It’s also worth noting that the “leader” quadrant in analyst reports changes frequently.

What is multivendor SASE?

Refers to a SASE platform that is provided by multiple vendors. This means you’d source that different components of the SASE platform, such as the secure web gateway (SWG), cloud access security broker (CASB), and zero-trust network access (ZTNA) from different vendors. This allows you to choose the best-of-breed solutions for each component of the platform. By using multivendor SASE platform, you avoid being tied to a single vendor and reduce the risk of vendor lock-in. On the negative side, managing multiple vendors is time-consuming than managing a single-vendor solution. Also, issues among vendors can impact the performance, efficiency and reliability of the SASE solution.

What is single-vendor SASE

Single-vendor SASE refers to a solution that is provided by a single vendor. This means that all of the components of the SASE platform, such as the secure web gateway (SWG), cloud access security broker (CASB), and zero-trust network access (ZTNA) are delivered by a single vendor. Advantages of single-vendor SASE include simplified management, smoother integration and enhanced support. Disadvantages include vendor lock-in, more limited capabilities compared to multivendor platforms, and higher costs for large organizations.

Who are the top SASE providers?

The leading SASE vendors include both established networking incumbents and well-funded startups. Many telcos and carriers also either offer their own SASE solutions (which they have typically gained through acquisitions) or resell and/or white-label services from pure-play SASE providers. Top vendors, in alphabetical order, include:

Akamai
Broadcom
Cato Networks
Cisco
Cloudflare
Forcepoint
Fortinet
HPE
Netskope
Palo Alto Networks
Perimeter 81
Proofpoint
Skyhigh Security
Versa
VMware
Zscaler

How to adopt SASE

Enterprises that must support a large, distributed workforce, a complicated edge with far-flung devices, and hybrid/multi-cloud applications should have SASE on their radar. For those with existing WAN investments, the logical first step is to investigate your WAN provider’s SASE services or preferred partners.

On the other hand, if your existing WAN investments are sunk costs that you’d prefer to walk away from, SASE offers a way to outsource and consolidate both WAN and security functions.

Over time, the line between SASE and SD-WAN will blur, so choosing one over the other won’t necessarily lock you into a particular path, aside from the constraints that vendors might erect.

For most enterprises, however, SASE will be part of a hybrid WAN/security approach. Traditional networking and security systems will handle pre-existing connections between data centers and branch offices, while it will be used to handle new connections, devices, users, and locations.

SASE isn’t a cure-all for network and security issues, nor is it guaranteed to prevent future disruptions, but it will allow companies to respond faster to disruptions or crises and to minimize their impact on the enterprise. In addition, it will allow companies to be better positioned to take advantage of new technologies, such as edge computing, 5G and mobile AI.

CISPE raises red alert over Broadcom’s VMware licensing changes

Thu, 13 Feb 2025 02:32:33 +0000

The European Cloud Competition Observatory (ECCO), a monitoring body established by the association of Cloud Infrastructure Services Providers in Europe (CISPE) as a way to keep cloud licensing practices fair in the region, this week came down hard on Broadcom for its failure to do so.

ECCO, launched last November as part of an anti-competition settlement with Microsoft agreed to in July 2024, was also given a mandate on its formation to monitor other large software firms, including Broadcom/VMware, and issue a report on its findings.

ECCO, which is made up of CISPE members as well as customer organizations Cigref, which represents French CACI40 CIOs and Beltug, which represents Belgian CIOs who act as physical observers, established a Red/Amber/Green (RAG) rating for the issues it looks into.

Green = On-Track: “significant and sufficient progress is being made;” Amber = Off-Track: “there are concerns either that progress has stalled, or that barriers to resolution are proving hard to overcome;” and Red = Critical: “insufficient progress has been made at the time of the report.”

Broadcom: Rating critical

In a report released Tuesday about Broadcom, the company’s status was labeled Red/Critical, with ECCO saying it “supports urgent calls for a formal investigation into Broadcom’s unfair software licensing practices.”

Its authors wrote, “since March 2024, CISPE has campaigned for Broadcom to reconsider its brutal and unacceptable changes to license agreements for the essential VMware virtualization software used by many of its members and cloud customers across Europe.”

They went on to say, “the vast majority of those affected have been forced to accept new terms tying them into exorbitant licence fees for a minimum of three years because they [simply] could not replace the VMware software needed to service their customers … ECCO supports urgent calls for a formal investigation into Broadcom’s unfair software licensing practices.”

Dario Maisto, senior analyst at Forrester who covers public cloud vendors in EMEA, said Wednesday, “the ECCO report’s feedback on Broadcom leaves little to say, with the vendor showing reportedly little to no interest in collaborating. What is interesting is that the Broadcom issue is bigger than this due to and since the VMware acquisition.”

Since the Broadcom acquisition, he said, “VMware has changed the product bundling, licensing, pricing, and go-to-market strategy for its entire portfolio. This has been a major source of concern for our clients at Forrester for many months now, and Broadcom’s competition posture in Europe only adds to the frustration we see in the market. So, in this case we have concerns coming not only from competitors but also from end-users. This may speed up Broadcom’s course of action at some point.”

A Broadcom representative said “We provide simplified licensing under a model that all leading enterprise software companies offer,” adding, “Our customer retention rate has remained consistent, demonstrating the value we’re delivering.”

Microsoft rating: Amber

ECCO was formed as result of a Memorandum of Understanding (MOU) signed last July between CISPE and Microsoft, and in their report on its progress, they give the firm an Amber rating.

The report stated that “both sides appear committed to finding solutions to open issues. Microsoft has dedicated significant resources, including engineering, development, legal, and senior leadership time to advancing the partnership.”

However, the authors state, “there was not sufficient progress on the hoster product between September 2024 and January 2025. There is a CISPE member concern that too much focus has been applied to the development of Azure Local as the route to delivering MOU requirements. Teams dedicated to the development of that product may not be working in full understanding of the legal requirements of the MOU and thus may not be prioritising the right work.”

According to a release from CISPE, “ECCO rates Microsoft’s progress to date as Amber, indicating that in some areas Microsoft has not yet fully met the expectations of some CISPE members related to the July 2024 agreement. ECCO’s report finds that further progress is needed on business opportunities that Microsoft agreed to collaboratively advance with CISPE as part of its previous agreement with CISPE that led to the Association withdrawing its competition complaint with the European Commission.”

Maisto pointed out, “while progresses have been made with ECCO to resolve CISPE’s competition complaints, there is a lot to do still. It is good that there is at least some level of discussion on competition with Microsoft as far as competition in the European market is concerned, but the pace at which issues are getting solved is probably slower than the rate at which new issues come to life.”

He added, “more stringent digital sovereignty requirements and new AI investments in Europe will force Microsoft to speed up certain collaborations and related activities on its roadmap. As time goes by, digital sovereignty regulations will force hyperscalers to cooperate with local players if they want to offer fully sovereign solutions. But this future is far from becoming reality soon.”

HPE expands ProLiant server portfolio, boosts AI and security features

Wed, 12 Feb 2025 18:38:16 +0000

Hewlett Packard Enterprise (HPE) added eight new servers to its ProLiant Gen 12 server portfolio, bringing advanced security and AI-optimization features to enterprise customers.

The Intel Xeon 6-based servers range from the high-end DL580 to the 1U DL320 – each with its own targeted data center and edge applications. For example, the 4U DL380 and 2U DL384 include Nvidia H200 Tensor Core GPUs, an AI accelerator that speeds AI inferencing and high-performance computing (HPC) workloads. And the 4U DL580 is aimed at data center applications that require high availability and performance, such as large databases and ERP systems.

All of the new servers include support for the latest version of HPE’s Integrated Lights Out (iLO) management technology, which lets customers diagnose and resolve server issues, configure and manage access, and perform a variety of other automated tasks aimed at improving efficiency, HPE stated.

The new version, iLO 7, features an independent processor called secure enclave that handles security functions for the server. Secure enclave establishes a hardware root of trust, which is a foundational security element that safeguards every phase of the server lifecycle, said Krista Satterthwaite, senior vice president and general manager, compute, at HPE. “Secure enclave provides unparalleled tamper-resistant protection for keys, passwords and security configuration,” Satterthwaite said.

In addition, iLO 7 adds to quantum computing-resistant security readiness by supporting “Leighton-Micali Signature (LMS)” (LMS), which guards against potential cryptographic hacks, she said. “We’re not just protecting servers from yesterday’s and today’s threats, but future threats that quantum computing may bring,” Satterthwaite said.

The servers can be centrally managed via HPE Compute Ops Management cloud-based platform, which lets customers control all HPE ProLiant servers from a single console.

AI-driven insights are part of the platform, designed to help customers better automate workload controls as well as help organizations improve energy efficiency by forecasting power usage and enabling enterprises to set thresholds to control costs and carbon emissions on a worldwide level, Satterthwaite said.

The platform also includes new global map view that lets customers can identify server health issues across distributed IT environments more quickly. The systems support automated on-boarding that simplifies server set-up and ongoing management, particularly in remote or branch-office deployments where local IT resources are not available, Satterthwaite said.

For all of its new servers, HPE said it will now offer optional direct liquid cooling support. HPE says liquid cooling is far more efficient at removing heat than traditional air cooling and can remove over 3,000 times more heat than air by volume. This heat dissipation from the server’s components, especially the high-power processors and memory, lets them utilize less energy, support denser deployments and lower overall operational costs, the vendor stated.

Six of the eight HPE ProLiant Compute Gen12 servers will be available in the first quarter of this year. This includes HPE ProLiant Compute DL320, DL340, DL360, DL380, DL380a and ML350 Gen12 servers. HPE Synergy 480 and HPE ProLiant Compute DL580 Gen12 servers are expected in the summer.

2025 global network outage report and internet health check

Wed, 12 Feb 2025 18:19:30 +0000

The reliability of services delivered by ISPs, cloud providers and conferencing services is critical for enterprise organizations. ThousandEyes, a Cisco company, monitors how providers are handling any performance challenges and provides Network World with a weekly roundup of events that impact service delivery. Read on to see the latest analysis, and stop back next week for another update on the performance of cloud providers and ISPs.

Note: We have archived prior-year outage updates, including our 2024 report, 2023 report and Covid-19 coverage.

Internet report for Feb. 3-9

ThousandEyes reported 353 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of Feb. 3-9. That’s up 7% from 331 outages the week prior. Specific to the U.S., there were 210 outages, which up 12% from 188 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, total ISP outages increased from 126 to 173 outages, a 37% increase compared to the week prior. In the U.S., ISP outages increased from 65 to 86, a 32% increase.

Public cloud network outages: Globally, cloud provider network outages decreased from 144 to 124 outages. In the U.S., however, cloud provider network outages increased from 88 to 96 outages.

Collaboration app network outages: Both globally and in the U.S., collaboration application network outages remained at zero for the second week in a row.

Two notable outages

On February 5, Lumen, a U.S. based Tier 1 carrier, experienced an outage that affected customers and downstream partners across multiple regions including the U.S., Canada, and Singapore. The outage, lasting a total of 35 minutes over a forty-five-minute period, was first observed around 3:30 AM EST and appeared to initially be centered on Lumen nodes located in Seattle, WA. Around five minutes into the outage, the nodes located in Seattle, WA were joined by nodes located in Los Angeles, CA, in exhibiting outage conditions. This increase in the number and location of nodes exhibiting outage conditions appeared to coincide with the peak in terms of the number of impacted regions, downstream partners, and customers. A further five minutes later, the nodes located in Los Angeles, CA, appeared to clear, leaving only the nodes located in Seattle, WA, in exhibiting outage conditions. The outage was cleared around 4:15 AM EST. Click here for an interactive view.

On February 6, Internap, a U.S. based cloud service provider, experienced an outage that impacted many of its downstream partners and customers within the U.S. The outage, lasting a total of one hour and 14 minutes, over a one hour and 28-minute period, was first observed around 12:15 AM EST and appeared to be centered on Internap nodes located in Boston, MA. The outage was at its peak around one hour and 10 minutes after being observed, with the highest number of impacted partners, and customers. The outage was cleared around 1:45 AM EST. Click here for an interactive view.

Internet report for Jan. 27-Feb. 2

ThousandEyes reported 331 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of Jan. 27-Feb. 2. That’s down 16% from 395 outages the week prior. Specific to the U.S., there were 188 outages, which down 4% from 195 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, total ISP outages decreased from 199 to 126 outages, a 37% decrease compared to the week prior. In the U.S., ISP outages decreased slightly from 67 to 65, a 3% decrease.

Public cloud network outages: Globally, cloud provider network outages increased slightly from 142 to 144 outages. In the U.S., however, cloud provider network outages decreased from 110 to 88 outages.

Collaboration app network outages: Both globally and in the U.S., collaboration application network outages dropped down to zero.

Two notable outages

On January 29, Arelion (formerly known as Telia Carrier), a global Tier 1 provider headquartered in Stockholm, Sweden, experienced an outage that impacted customers and downstream partners across multiple regions, including the U.S., Australia, Argentina, Belgium, Bahrain, Germany, France, Brazil, India, Peru, Mexico, and Guatemala. The disruption, which lasted a total of 24 minutes over a 55-minute period, was first observed around 12:40 PM EST and appeared to initially center on nodes located in Dallas, TX, and Ghent, Belgium. Fifteen minutes after appearing to clear, the nodes located in Dallas, TX, began exhibiting outage conditions again. Around 12:20 PM EST, the nodes located in Dallas, TX, were joined by nodes located in Atlanta, GA, in exhibiting outage conditions. This rise in nodes and locations exhibiting outage conditions also appeared to coincide with an increase in the number of downstream customers, partners, and regions impacted. The outage was cleared around 1:35 PM EST. Click here for an interactive view.

On February 2, Cogent Communications, a multinational transit provider based in the U.S., experienced an outage that affected customers and downstream partners across multiple regions including the U.S., Poland, and Spain. The outage, lasting a total of 22 minutes, was first observed around 3:10 AM EST and appeared to initially center on nodes located in Washington, D.C. Fifteen minutes after first being observed, the nodes located in Washington, D.C., appeared to clear and were replaced by nodes located in Miami, FL, in exhibiting outage conditions. A further five minutes later, the nodes located in Miami, FL, were joined by nodes located in Atlanta, GA, in exhibiting outage conditions. This increase in nodes exhibiting outage conditions appeared to coincide with an increase in the number of impacted downstream partners and customers. The outage was cleared around 3:55 AM EST. Click here for an interactive view.

Internet report for Jan. 20-26

ThousandEyes reported 395 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of Jan. 20-26. That’s up 20% from 328 outages the week prior. Specific to the U.S., there were 195 outages, which up 24% from 157 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, total ISP outages increased slightly from 186 to 199 outages, a 7% increase compared to the week prior. In the U.S., ISP outages increased from 53 to 67, a 26% increase.

Public cloud network outages: Globally, cloud provider network outages jumped from 76 to 142 outages. In the U.S., cloud provider network outages increased from 69 to 110 outages.

Collaboration app network outages: Globally, collaboration application network outages remained unchanged from the week prior, recording 1 outage. In the U.S., collaboration application network outages dropped to zero.

Two notable outages

On January 24, Lumen, a U.S. based Tier 1 carrier, experienced an outage that affected customers and downstream partners across multiple regions including the U.S., Italy, Canada, France, India, the U.K., Germany, and the Netherlands. The outage, lasting a total of 37 minutes, over a period of 45 minutes, was first observed around 1:20 AM EST and appeared to be centered on Lumen nodes located in New York, NY. Around five minutes into the outage, a number of Lumen nodes exhibiting outage conditions in New York, NY, appeared to reduce. This drop in the number of nodes exhibiting outage conditions appeared to coincide with a decrease in the number of impacted downstream partners, and customers. The outage was cleared around 7:05 AM EST. Click here for an interactive view.

On January 23, AT&T, U.S.-based telecommunications company, experienced an outage on its network that impacted AT&T customers and partners across the U.S. The outage, lasting a total of 13 minutes over a 20-minute period, was first observed around 10:35 AM EST and appeared to center on AT&T nodes located in Dallas, TX. Around 15 minutes after first being observed, the number of nodes exhibiting outage conditions located in Dallas, TX, appeared to reduce. This decrease in nodes exhibiting outage conditions appeared to coincide with a drop in the number of impacted partners and customers. The outage was cleared at around 10:35 AM EST. Click here for an interactive view.

Internet report for Jan. 13-19

ThousandEyes reported 328 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of Jan. 13-19. That’s up 11% from 296 outages the week prior. Specific to the U.S., there were 157 outages, which up 34% from 117 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, total ISP outages increased slightly from 182 to 186 outages, a 2% increase compared to the week prior. In the U.S., ISP outages increased from 40 to 53, a 33% increase.

Public cloud network outages: Globally, cloud provider network outages increased from 72 to 76 outages. In the U.S., cloud provider network outages increased from 54 to 69 outages.

Collaboration app network outages: Globally, and in the U.S., collaboration application network outages dropped from two outages to one.

Two notable outages

On January 15, Lumen, a U.S. based Tier 1 carrier (previously known as CenturyLink), experienced an outage that affected customers and downstream partners across multiple regions including the U.S., Hong Kong, Germany, Canada, the U.K., Chile, Colombia, Austria, India, Australia, the Netherlands, Spain, France, Singapore, Japan, South Africa, Nigeria, China, Vietnam, Saudi Arabia, Israel, Peru, Norway, Argentina, Turkey, Hungary, Ireland, New Zealand, Egypt, the Philippines, Italy, Sweden, Bulgaria, Estonia, Romania, and Mexico. The outage, lasting a total of one hour and 5 minutes over a nearly three-hour period, was first observed around 5:02 AM EST and appeared to initially be centered on Lumen nodes located in Dallas, TX. Around one hour after appearing to clear, nodes located in Dallas, TX, began exhibiting outage conditions again, this time joined by Lumen nodes located in San Jose, CA, Washington, D.C., Chicago. IL, New York, NY, London, England, Los Angeles, CA, San Francisco, CA, Sacramento, CA, Fresno, CA, Seattle, WA, Santa Clara, CA, and Colorado Springs, CO, in exhibiting outage conditions. This increase in the number and location of nodes exhibiting outage conditions appeared to coincide with the peak in terms of the number of regions and downstream partners, and customers impacted. The outage was cleared around 7:25 AM EST. Click here for an interactive view.

On January 16, Hurricane Electric, a network transit provider headquartered in Fremont, CA, experienced an outage that impacted customers and downstream partners across multiple regions, including the U.S., Malaysia, Singapore, Indonesia, New Zealand, Hong Kong, the U.K., Canada, South Korea, Japan, Thailand, and Germany. The outage, lasting 22 minutes, was first observed around 2:28 AM EST and initially appeared to center on Hurricane Electric nodes located in Chicago, IL. Five minutes into the outage, the nodes located in Chicago, IL, were joined by Hurricane Electric nodes located in Portland, OR, Seattle, WA, and Ashburn, VA, in exhibiting outage conditions. This coincided with an increase in the number of downstream partners and countries impacted. Around 12 minutes into the outage, all nodes, except those located in Chicago, IL, appeared to clear. The outage was cleared at around 2:55 AM EST. Click here for an interactive view.

Internet report for Jan. 6-12

ThousandEyes reported 296 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of Jan. 6. That’s double the number of outages the week prior (148). Specific to the U.S., there were 117 outages, which up 50% from 78 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, total ISP outages increased from 80 to 182 outages, a 127% increase compared to the week prior. In the U.S., ISP outages increased from 25 to 40, a 60% increase.

Public cloud network outages: Globally, cloud provider network outages increased from 34 to 72 outages. In the U.S., cloud provider network outages increased from 31 to 54 outages.

Collaboration app network outages: Globally, and in the U.S., there were two collaboration application network outages, up from one a week earlier.

Two notable outages

On January 8, Cogent Communications, a multinational transit provider based in the U.S., experienced an outage that impacted multiple downstream providers and customers across various regions, including the U.S., India, Canada, Mexico, Singapore, South Africa, Indonesia, Sweden, the U.K., Honduras, Japan, Vietnam, Thailand, Poland, the Netherlands, Australia, the Philippines, Greece, Germany, Argentina, New Zealand, France, Malaysia, Taiwan, and Colombia. The outage lasted a total of one hour and nine minutes, distributed across a series of occurrences over a period of three hours and 50 minutes. The first occurrence of the outage was observed around 6:00 AM EST and initially seemed to be centered on Cogent nodes located in Los Angeles, CA. Around three hours and 20 minutes after first being observed, nodes in Los Angeles, CA, began exhibiting outage conditions again, this time accompanied by nodes in Chicago, IL, El Paso, TX, and San Jose, CA. This increase in nodes experiencing outages appeared to coincide with a rise in the number of affected downstream customers, partners, and regions. Five minutes later, the nodes located in Chicago, IL, and El Paso, TX, appeared to clear, leaving only the nodes in Los Angeles, CA, and San Jose, CA, exhibiting outage conditions. The outage was cleared around 9:50 AM EST. Click here for an interactive view.

On January 10, Lumen, a U.S. based Tier 1 carrier (previously known as CenturyLink), experienced an outage that affected customers and downstream partners across multiple regions including Switzerland, South Africa, Egypt, the U.K., the U.S., Spain, Portugal, Germany, the United Arab Emirates, France, Hong Kong, and Italy The outage, lasting a total of 19 minutes, was first observed around 9:05 PM EST and appeared to be centered on Lumen nodes located in London, England, and Washington, D.C. Around twenty-five minutes from when the outage was first observed, the nodes located in London, England, appeared to clear, leaving only Lumen nodes located in Washington, D.C. exhibiting outage conditions. This drop in the number of nodes and locations exhibiting outage conditions appeared to coincide with a decrease in the number of impacted downstream partners, and customers. The outage was cleared around 9:55 PM CET. Click here for an interactive view.

Internet report for Dec. 30, 2024-Jan. 5, 2025

ThousandEyes reported 148 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of Dec. 30, 2024. That’s up 95% from 76 outages the week prior. Specific to the U.S., there were 78 outages, which up nearly threefold from 28 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, total ISP outages increased from 46 to 80 outages, a 74% increase compared to the week prior. In the U.S., ISP outages increased from 10 to 25, a 150% increase.

Public cloud network outages: Globally, cloud provider network outages increased from 18 to 34 outages. In the U.S., cloud provider network outages increased from 13 to 31 outages.

Collaboration app network outages: There was one collaboration application network outage globally and in the U.S., which is an increase from zero in the previous week.

Two notable outages

On December 30, Neustar, a U.S. based technology service provider headquartered in Sterling, VA, experienced an outage that impacted multiple downstream providers, as well as Neustar customers within multiple regions, including the U.S., Mexico, Taiwan, Singapore, Canada, the U.K., Spain, Romania, Germany, Luxembourg, France, Costa Rica, Ireland, Japan, India, Hong Kong, and the Philippines. The outage, lasting a total of one hour and 40 minutes, was first observed around 2:00 PM EST and appeared to initially center on Neustar nodes located in Los Angeles, CA, and Washington, D.C. Around 10 minutes into the outage, nodes located in Washington, D.C., were replaced by nodes located in Ashburn, VA, in exhibiting outage conditions. Around 10 minutes later, nodes located in Virginia, VA, and Los Angeles, CA, appeared to clear and were replaced by nodes located in Dallas, TX and San Jose, CA, in exhibiting outage conditions. Five minutes later, these nodes were replaced by nodes located in London, England, Ashburn, VA, New York, NY, and Washington, D.C. A further five minutes later, these nodes were joined by nodes located in Dallas, TX, in exhibiting outage conditions. This increase in nodes exhibiting outage conditions also appeared to coincide with an increase in the number of downstream partners and regions impacted. The outage was cleared around 3:40 PM EST. Click here for an interactive view.

On January 4, AT&T experienced an outage on their network that impacted AT&T customers and partners across multiple regions including the U.S., Ireland, the Philippines, the U.K., France, and Canada. The outage, lasting around 23 minutes, was first observed around 3:35 AM EST, appearing to initially center on AT&T nodes located in Phoenix, AZ, Los Angeles, CA, San Jose, CA, and New York, NY. Around ten minutes into the outage, nodes located in Phoenix, AZ, and San Jose, CA, appeared to clear, leaving just nodes located in Los Angeles, CA, and New York, NY, exhibiting outage conditions. This decrease in nodes exhibiting outage conditions appeared to coincide with a drop in the number of impacted partners and customers. The outage was cleared at around 4:00 AM EST. Click here for an interactive view.

Showing memory usage in Linux by process and user

Wed, 12 Feb 2025 18:11:56 +0000

There are a lot of tools for looking at memory usage on Linux systems. Some are commonly used commands like ps, while others are tools like top that allow you to display system performance stats in various ways. In this post, we’ll look at a number of commands that can be very helpful in identifying the users and processes that are using the most memory.

Using free

One of the most obvious commands for viewing memory usage is the free command, which displays total, used and free memory. With the -h option, it displays the stats in “human-readable” format, displaying the numbers in megabytes, gigabytes, etc. as appropriate.

$ free
              total       used       free      shared  buff/cache   available
Mem:        3875540    2082216     109908      141692     1683416     1413648
Swap:       3874812      35328    3839484

$ free -h
              total       used       free      shared  buff/cache   available
Mem:          3.7Gi      2.0Gi      221Mi      132Mi       1.5Gi       1.4Gi
Swap:         3.7Gi       47Mi      3.6Gi

Using top

Another of the best commands for looking at memory usage is top. One extremely easy way to see what processes are using the most memory is to start top and then press shift+m to switch the order of the processes shown to rank them by the percentage of memory each is using. Once you’ve entered shift+m, your top output should reorder the task entries to look something like what you see below. Note the %MEM column, as shown below, displays processes in memory usage order.

top - 14:22:38 up 12 min,  3 users,  load average: 0.02, 0.22, 0.32
MiB Mem :   3784.7 total,    967.8 free,   1249.0 used,   1567.9 buff/cache
MiB Swap:   3784.0 total,   3737.2 free,     46.8 used.   2170.7 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   3953 fedora    20   0 1785992 537936  76568 S   0.0  13.9   0:37.63 gnome-software
   3590 fedora    20   0 3868704 147920  96512 S   0.0   3.8   0:07.35 gnome-shell
   5885 root      20   0  664948 131048  24632 S   0.0   3.4   0:07.94 packagekitd
   5444 fedora    20   0  690380  77988  26884 S   0.0   2.0   0:02.11 python

The list will be limited by your window size, but the most significant processes with respect to memory usage will show up at the top of the process list.

If you want to focus on a single user, top can be used much in the same way that it is used above. Just add a username with the -U option as shown below and press the shift+m keys to order by memory usage:

$ top -U nemo
top - 14:28:54 up 18 min,  3 users,  load average: 0.01, 0.08, 0.22
Tasks: 242 total,   1 running, 241 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.5 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3784.7 total,    761.7 free,   1118.5 used,   1904.5 buff/cache
MiB Swap:   3784.0 total,   3741.2 free,     42.8 used.   2298.2 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   6906 nemo      20   0  225276   3712   2944 R   1.0   0.1   0:00.10 top
   5681 nemo      20   0   30604  16004  10496 S   0.0   0.4   0:00.29 systemd
   5702 nemo      20   0  168976  22256   2048 S   0.0   0.6   0:00.00 (sd-pam)
   5718 nemo      20   0   17192   7224   5120 S   0.0   0.2   0:00.04 sshd
   5732 nemo      20   0  224656   5248   3584 S   0.0   0.1   0:00.06 bash

Using htop

The htop command will run interactively – updating its display of all running processes, along with CPU, memory, and swap usage. You can also specify options when launching htop. For example, you can add a delay between its update using the -d option.

The htop output will look like what you see below. Notice the reports on the number of running tasks, load averages, overall memory usage and swap usage that precede details on tasks using the most memory.

0[                      2.69%] Tasks: 147, 345 thr, 100 kthr; 0 running
   1[|||                    5.2%] Load average: 0.05 0.06 0.03
 Mem[||||||||||||||||1.16G/3.70G] Uptime: 81:51:21
 Swp[|||              180M/3.70G]

 PID USER PRI NI  VIRT  RES SHR S CPU% MEM% TIME+   Command
1234 user  20  0 1234M 250M 15M R 14.5 13.5 1:34.56 /usr/bin/gno
5678 user  20  0  678M 100M 10M S  6.2  0.6 0:23.12 code
4321 root  20  0  512M  80M  5M S  3.1  0.5 0:12.45 systemd
8765 user  20  0  200M  50M  3M S  1.6  0.3 0:05.67 terminal

Using vmstat

The vmstat command provides information about processes, memory, paging, block IO, traps, disks and cpu activity. Here’s an example of its output:

$ vmstat
procs -----------memory--------- ---swap-- -----io--- -system- ------cpu----
 r  b   swpd   free buff  cache   si   so    bi   bo   in   cs us sy id wa st
 0  0 185088 723712 2312 1998296   1   11   159   88  418  207  3  2 96  0  0

Note how the headings are used to separate the data into groups (e.g., memory includes swpd, fee, buff and cache). Here’s a listing of what each of these field headings means:

procs

r: The number of processes waiting for run time
b: The number of processes in uninterruptible sleep

memory

swpd: the amount of virtual memory used
free: the amount of idle memory
buff: the amount of memory used as buffers
cache: the amount of memory used as cache
inact: the amount of inactive memory (-a option)
active: the amount of active memory (-a option)

swap

si: Amount of memory swapped in from disk (/s)
so: Amount of memory swapped to disk (/s)

bi: Blocks received from a block device (blocks/s)
bo: Blocks sent to a block device (blocks/s)

system

in: The number of interrupts per second, including the clock
cs: The number of context switches per second

cpu (these are percentages of total CPU time)

us: Time spent running non-kernel code (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown

Using ps

The ps command includes a column that displays memory usage for each process. To get the most useful display for viewing the top memory users, you can pass the ps output to the sort command. Here’s an example that provides a very useful display. The 4^th field displays the percentage of memory used by each process.

$ ps aux | sort -rnk 4 | head -5
nemo       400  3.4  9.2 3309580 563336 ?      Sl   08:59   1:36 /usr/lib/firefox/firefox -contentproc -childID 6 -isForBrowser -prefsLen 9086 -prefMapSize 210653 -parentBuildID 20200107212822 -greomni /usr/lib/firefox/omni.ja -appomni /usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 32469 true tab
nemo     32469  8.2  7.7 3492840 469516 ?      Sl   08:54   4:15 /usr/lib/firefox/firefox -new-window
nemo     32542  8.9  7.6 2875428 462720 ?      Sl   08:55   4:36 /usr/lib/firefox/firefox -contentproc -childID 2 -isForBrowser -prefsLen 1 -prefMapSize 210653 -parentBuildID 20200107212822 -greomni /usr/lib/firefox/omni.ja -appomni /usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 32469 true tab
nemo       342  9.9  5.9 2854664 363528 ?      Sl   08:59   4:44 /usr/lib/firefox/firefox -contentproc -childID 5 -isForBrowser -prefsLen 8763 -prefMapSize 210653 -parentBuildID 20200107212822 -greomni /usr/lib/firefox/omni.ja -appomni /usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 32469 true tab
nemo      2389 39.5  3.8 1774412 236116 pts/1  Sl+  09:15  12:21 vlc videos/edge_computing.mp4

In the example above (truncated for this post), sort is being used with the -r (reverse), the -n (numeric) and the -k (key) options which are telling the command to sort the output in reverse numeric order based on the fourth column (memory usage) in the output from ps. If we first display the heading for the ps output, this is a little easier to see.

$ ps aux | head -1; ps aux | sort -rnk 4 | head -5
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nemo       400  3.4  9.2 3309580 563336 ?      Sl   08:59   1:36 /usr/lib/firefox/firefox -contentproc -childID 6 -isForBrowser -prefsLen 9086 -prefMapSize 210653 -parentBuildID 20200107212822 -greomni /usr/lib/firefox/omni.ja -appomni /usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 32469 true tab
nemo     32469  8.2  7.7 3492840 469516 ?      Sl   08:54   4:15 /usr/lib/firefox/firefox -new-window
nemo     32542  8.9  7.6 2875428 462720 ?      Sl   08:55   4:36 /usr/lib/firefox/firefox -contentproc -childID 2 -isForBrowser -prefsLen 1 -prefMapSize 210653 -parentBuildID 20200107212822 -greomni /usr/lib/firefox/omni.ja -appomni /usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 32469 true tab
nemo       342  9.9  5.9 2854664 363528 ?      Sl   08:59   4:44 /usr/lib/firefox/firefox -contentproc -childID 5 -isForBrowser -prefsLen 8763 -prefMapSize 210653 -parentBuildID 20200107212822 -greomni /usr/lib/firefox/omni.ja -appomni /usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 32469 true tab
nemo      2389 39.5  3.8 1774412 236116 pts/1  Sl+  09:15  12:21 vlc videos/edge_computing.mp4

If you like this command, you can make it easier to use by setting up as an alias with a command like the one shown below. Don’t forget to add it to your ~/.bashrc file if you want to make it permanent.

$ alias mem-by-proc="ps aux | head -1; ps aux | sort -rnk 4"

You can also use a ps command to rank an individual user’s processes by memory usage. In this example, we do this by selecting a single user’s processes with a grep command:

$ ps aux | head -1; ps aux | grep ^nemo| sort -rnk 4 | more 
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nemo     32469  7.1 11.5 3724364 701388 ?      Sl   08:54   7:21 /usr/lib/firefox/firefox -new-window
nemo       400  2.0  8.9 3308556 543232 ?      Sl   08:59   2:01 /usr/lib/firefox/firefox -contentproc -childID 6 -isForBrowser -prefsLen 9086 -prefMapSize 210653 -parentBuildID 20200107212822 -greomni /usr/lib/firefox/omni.ja -appomni/usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 32469 true tab
nemo     32542  7.9  7.1 2903084 436196 ?      Sl   08:55   8:07 /usr/lib/firefox/firefox -contentproc -childID 2 -isForBrowser -prefsLen 1 -prefMapSize 210653 -parentBuildID 20200107212822 -greomni /usr/lib/firefox/omni.ja -appomni /usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 32469 true tab
nemo       342 10.8  7.0 2941056 426484 ?      Rl   08:59  10:45 /usr/lib/firefox/firefox -contentproc -childID 5 -isForBrowser -prefsLen 8763 -prefMapSize 210653 -parentBuildID 20200107212822 -greomni /usr/lib/firefox/omni.ja -appomni /usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 32469 true tab
nemo      2389 16.9  3.8 1762960 234644 pts/1  Sl+  09:15  13:57 vlc videos/edge_computing.mp4
nemo     29527  3.9  3.7 2736924 227448 ?      Ssl  08:50   4:11 /usr/bin/gnome-shell

Using ps along with other commands

Commands can get complicated when you want to compare users’ memory usages with each other. In that case, creating a by-user total and ranking them is a good technique, but it requires a little more work and requires a number of commands.

The script below counts up the memory usage for each active user (including system accounts). The first (outer) for loop creates a list of users with active processes. The second (inner) counts up the percentage of memory each of those users’ processes occupy.

#/bin/bash

echo -e USER "\t" % memory used
echo "=================="
for USER in `ps aux | grep -v USER | awk '{print $1}' | sort | uniq`
do
  sum=0
  for num in `ps aux | grep $USER | awk '{print $4}'`
  do
   sum=`echo $sum + $num | bc`
  done
  echo -e $sum "\t" $USER
done

Output from this script might look like this:

$ ./show_user_mem_usage
USER     % memory used
==================
.1       avahi
.1       chrony
.3       colord
.9       dbus
44.0     fedora
.9       geoclue
1.0      pcp
.2       polkitd
16.2     root
0        rtkit
1.5      shs
.5       systemd+

Viewing the /proc/meminfo file

The /proc/meminfofile provides a memory usage report for all kinds of system resources. The example below should give you an idea how detailed this data can be. The column command allows you to view the long list of stats in a single window. It adjusts to the width of your screen.

$ cat /proc/meminfo | column
MemTotal:        3875540 kB     SecPageTables:         0 kB
MemFree:          740468 kB     NFS_Unstable:          0 kB
MemAvailable:    2341612 kB     Bounce:                0 kB
Buffers:            2312 kB     WritebackTmp:          0 kB
Cached:          1890228 kB     CommitLimit:     5812580 kB
SwapCached:          292 kB     Committed_AS:    5574984 kB
Active:          1295012 kB     VmallocTotal:   34359738367 kB
Inactive:        1458080 kB     VmallocUsed:       51380 kB
Active(anon):     617436 kB     VmallocChunk:          0 kB
Inactive(anon):   331996 kB     Percpu:             1832 kB
Active(file):     677576 kB     HardwareCorrupted:     0 kB
Inactive(file):  1126084 kB     AnonHugePages:         0 kB
Unevictable:       20512 kB     ShmemHugePages:        0 kB
Mlocked:            5568 kB     ShmemPmdMapped:        0 kB
SwapTotal:       3874812 kB     FileHugePages:         0 kB
SwapFree:        3689724 kB     FilePmdMapped:         0 kB
Zswap:                 0 kB     CmaTotal:              0 kB
Zswapped:              0 kB     CmaFree:               0 kB
Dirty:               768 kB     Unaccepted:            0 kB
Writeback:             0 kB     HugePages_Total:       0
AnonPages:        880780 kB     HugePages_Free:        0
Mapped:           313200 kB     HugePages_Rsvd:        0
Shmem:             83768 kB     HugePages_Surp:        0
KReclaimable:      93488 kB     Hugepagesize:       2048 kB
Slab:             203160 kB     Hugetlb:               0 kB
SReclaimable:      93488 kB     DirectMap4k:      173804 kB
SUnreclaim:       109672 kB     DirectMap2M:     3893248 kB
KernelStack:        9704 kB     DirectMap1G:           0 kB
PageTables:        25668 kB

Using dstat

The dstat command provides real-time statistics on various system resources, including CPU, memory, disk, network and more. Here’s an example of the command’s output.

$ dstat
You did not select any stats, using -cdngy by default.
----total-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw
  0   0  99   0   0|   0     0 | 506B  658B|   0     0 |  91   172
  0   0  99   0   0|   0     0 |  60B  338B|   0     0 |  90   152
  3   1  96   0   0|   0     0 |  60B  354B|   0     0 | 277   518
  0   0  99   0   0|   0     0 |  60B  354B|   0     0 | 106   135
  1   0  99   0   0|   0  4096B|  60B  354B|   0     0 |  97   162
  1   0  99   0   0|   0     0 |7523B  492B|   0     0 |  81   130
  0   1  98   0   0|   0     0 | 506B  354B|   0     0 | 155   170
  0   0  99   0   0|   0     0 | 120B  424B|   0     0 |  59   102 ^C

Wrap-up

There are certainly quite a few ways to display memory usage on Linux. To focus on how much memory various processes and users are consuming, get accustomed to the command options that are available to you. And, when it’s helpful, create aliases to make your queries easier.

Nvidia partners with cybersecurity vendors for real-time monitoring

Wed, 12 Feb 2025 18:06:57 +0000

Nvidia has partnered with leading cybersecurity firms to provide real-time security protection using its accelerator and networking hardware in combination with its AI software.

Under the agreement, Nvidia will work with partners to provide integration of its BlueField and Morpheus hardware with cyber defenses software from Armis, Check Point Software Technologies, CrowdStrike, Deloitte and World Wide Technology (WWT).

BlueField data processing units (DPUs) are designed to offload and accelerate networking traffic and specific tasks from the CPU like security and storage. It has libraries, APIs, and a set of tools that allow developers to build high-performance applications for data centers supporting network virtualization, storage offload, and high-performance computing (HPC) workloads.

Morpheus is a GPU-accelerated data processing framework optimized for cybersecurity, using deep learning and machine learning models to detect and mitigate cyber threats in real time. It can process large amounts of data from network traffic, logs, and endpoint behavior to identify anomalies and potential threats.

“It’s important to understand that BlueField and Morpheus are complementing enterprise security companies, not competing with them,” said the company spokesman. “Nvidia is not a cybersecurity provider, and doesn’t aim to be. What we’re doing is empowering the cybersecurity ecosystem with DPU- and GPU-accelerated computing and AI capabilities.”

Another piece of the puzzle is Nvidia’s Data Center Infrastructure-on-a-Chip (DOCA) architecture, a software framework designed to enable and accelerate workloads on BlueField DPUs.

Unlike conventional offerings that rely on intrusive methods or software agents, BlueField-3 DPUs function as a virtual security overlay. They inspect network traffic and safeguard host integrity without disrupting operations.

Other packages rely on tapping devices to access network data, which helps create a map of interconnected devices. But these tools don’t offer the same level of detail about what’s happening at the node level.

“The beauty of Nvidia BlueField is that it enables comprehensive visibility at the host, network, and application levels at the same time,” said the spokesman. “What we’re doing is we’re taking BlueField and integrating it into existing solutions. So it’s a ‘better together’ story rather than BlueField competing with other products.”

Nvidia’s cybersecurity AI platform is available today. Some of the capabilities and partner integrations are in early access and will be generally available in Q2.

Cisco doubles down on AI with new, updated certifications

Tue, 11 Feb 2025 19:04:33 +0000

Cisco this week furthered its commitment to help customers support and develop artificial intelligence systems by rolling out new certification and training courses aimed at teaching professionals everything from how to incorporate AI into specific roles to advanced networking design.

“We believe an AI-powered workforce is critical because AI has tremendous—potential but can deepen the digital divide if we don’t act intentionally, making sure that no populations get left behind,” wrote Par Merat, Vice President of Cisco Learning and Certifications, in a company blog announcing the news. “We’re moving full speed ahead to help IT professionals prepare and upskill for this latest technology transformation and the use of AI with our purpose at the forefront: to power an inclusive future for all.”

[ Related: More Cisco news and insights ]

Cisco explained how its AI training roadmap will enable skills-based AI readiness in three ways:

Introduction to Modern AI is available at NetAcad.com, and it is designed to teach technology pros how to learn to use AI in their daily lives, craft effective prompt chatbots, and use computer vision and machine translation.
For AI skills in professional roles, Cisco created a new learning path available on Cisco U that will help engineers and architects learn all the skills needed to implement AI solutions on Cisco infrastructure.
Cisco is updating its existing learning materials in the data center and security certification tracks to incorporate the AI-related information that IT professionals need at different levels of expertise.

Cisco also shared its AI Infrastructure learning path that the company says covers the entire AI learning continuum, from entry-level to expert. Professionals completing this program will gain the AI infrastructure skills they need to advance their careers and benefit the organizations they support.

Cisco announced its AI Solutions on Cisco Infrastructure Essentials (DCAIE) last fall, and the first part of the AI Solutions on Cisco Infrastructure Learning Path called Essentials will run from February 7 through March 24. Cisco plans to have more advanced material available before Cisco Live U.S. in June.

“Essentials is everything you need to get started in building AI infrastructure. It begins with three courses on AI basics and a series on AI infrastructure requirements,” Merat explained in the blog. “It then covers AI network architectures, AI data considerations (including privacy and sovereignty), compliance (sustainability and power management), and hardware resources.”

Essentials is geared toward data center network engineers, data center system engineers, IT infrastructure architects, and IT operations engineers. The training focuses on professionals working on a multi-vendor approach to building AI infrastructure. In addition to adding AI infrastructure skills to its data center certification track from CCNP to CCIE, Cisco is also working to add relevant AI and machine learning skills to its existing certifications.

The focus on AI in training is important now, according to Cisco, as business and technology leaders work to keep pace with technology advances and stay ahead of their competitors.

“In a dynamic landscape where competition is fierce, speed decides the winders. Leaders who act decisively today to build resilient, future-proofed networks will be the AI-forward leaders driving real value for their business,” said Jeetu Patel, Cisco’s Chief Product Officer, in a statement. “Eventually, there will be only two kinds of companies: those that are AI companies, and those that are irrelevant.”

Also at Cisco Live EMEA in Amsterdam, the company released the results of its CEO study, which was conducted by Opinion Matters between December 24, 2024 and January 2, 2025 and surveyed some 2,503 CEOs from companies with more than 250 employees worldwide. The research shows that the majority of CEOs polled recognize AI’s potential benefits and plan to integrate AI into their operations, 74% fear that gaps in knowledge will hinder decisions in the boardroom and 58% worry it will stifle growth. More than 70% of the CEOs worry about losing ground to competitors and missing out on opportunities because of IT and infrastructure gaps. The study shows that 61% of CEOs are improving AI education to address their concerns.

“CEOs are turning to AI for its transformative potential: driving efficiency (69%), spurring innovation (68%), and outpacing competitors (54%). But fulfilling that ambition requires CEOs to break down the barriers holding them back from realizing AI’s potential: skills shortages, infrastructure gaps, and security risks,” Cisco said in a statement.

In addition to the survey, Cisco unwrapped a new family of data center switches — the N9300 series — it says will help customers more securely support large workloads and facilitate AI development across the enterprise.

Cisco data center switches feature baked-in security for AI, networking duties

Tue, 11 Feb 2025 18:32:35 +0000

Cisco has unwrapped a new family of data center switches it says will help customers more securely support large workloads and facilitate AI development across the enterprise.

The N9300 Smart Switch series is built on the vendor’s powerful 4.8T capacity Silicon One chip and includes built-in programmable data processing units (DPU) from AMD to offload complex data processing work and free up the switches for AI and large workload processing. The AMD DPUs are based on technology developed by Pensando, which AMD bought in 2022 for $1.9 billion. (HPE Aruba’s CX 10000 high-end switch uses AMD DPUs, and other vendors such as Microsoft and IBM use the technology as well.)

[ Related: More Cisco news and insights ]

DPU acceleration unlocks a variety of network and security services such as stateful segmentation, large-scale NAT, IPsec encryption, IDS/IPS, event-based telemetry, and DDoS protection, according to Kevin Wollenweber, senior vice president and general manager of Cisco’s data center and service provider business.

“With the 9300 Smart Switches, we are bringing security technologies into a fabric, so customers can [have] protection baked into their architecture from the network interface card to the switch,” Wollenweber said. “We can protect AI fabrics, both for access rights and segmentation of people that should or shouldn’t have access to them, or for the models themselves to be protected in terms of the data that they have access to and kind of where they’re pulling things from,” he said.

The first major service these DPUs will perform on the switch will be Layer 4 stateful segmentation through Cisco’s Hypershield security architecture. Hypershield uses AI to dynamically refine security policies based on application identity and behavior. It automates policy creation, optimization, and enforcement across workloads. In addition, Hypershield promises to let organizations autonomously segment their networks when threats are a problem, gain exploit protection without having to patch or revamp firewalls, and automatically upgrade software without interrupting computing resources.

The N9300 for now will come in two forms: the N9324C, a 24-port, 100G switch with 800G throughput; and the N9348Y2C6D, which will feature 48 25G ports, two 100G ports, and six 400G ports with 800G throughput, Wollenweber said.

The N9324C, available soon, is positioned as an edge device where customers can inspect and protect traffic and access as users come in and out of the network, Wollenweber said. The second model, which will be released towards the middle of the year, is a top-of-rack switch with 25G ports as well as 100/400G uplinks for server connectivity.

“We have some customers that are going to deploy these when they do their next network refresh. When they start to look at new switches, they’ll deploy these smart switches today and deploy it with Hypershield. Or, they’ll even add these network services over time, because they now have a more intelligent device that can take on new personas or new features and functions,” Wollenweber said.

Integrated security with Cisco Hypershield a draw for enterprises

The range of networking and security services the new N9300s support will make them attractive to data center customers, experts said.

“While AI applications have brought the bandwidth and latency concerns back to the top of the networking requirements, additional capabilities are also top-of-mind. Security, especially in hybrid and multi-cloud networks, requires segmentation and enforcement, and the Cisco N9300 can be hooked into Cisco Hypershield to be a network-based enforcement node for certain policies,” said Paul Nicholson, research vice president, cloud and datacenter networks, with IDC.

“Also, the digital twin capabilities, where upgrades and changes can be tested on a shadow data plane before going into production, will be attractive to IT operations, especially if they do not have the capability today,” Nicholson said.

Additional hardware capabilities can offer multiple benefits – accelerating security policies, offloading other processors to concentrate on their core tasks for better networking performance, and adding capabilities at scale that would not be practical before, Nicholson said.

“The Cisco N9300 has both the Cisco Silicon One E100 ASIC and the AMD/Pensando DPU, thus multiple networking and security services can be accelerated or enhanced with them, when enabled,” Nicholson said. “For example, the stateful segmentation mentioned in the launch. And, Cisco stated more functionality is to follow.”

Enterprises are already planning to deploy many genAI apps this year, according to IDC’s “AI in Networking Special Report,” published at the end of 2024. The research showed that 74.4% of respondents are planning between 11 and 30+ applications with genAI, so plans are already in motion, Nicholson said.

“This represents an opportunity for vendors like Cisco to upgrade their customers to meet the throughput and latency requirements of genAI applications. This extends to both the learning/training phase, to also encompassing the inference/delivery phase, to the end-user,” Nicholson said.

IP Fabric expands network visibility, automation with platform update

Tue, 11 Feb 2025 17:05:02 +0000

As modern networking deployments across the cloud and edge become increasingly complex, it can be a non-trivial exercise for teams to get proper visibility. IP Fabric is updating its automated network assurance platform to version 7.0 with the goal of helping organizations to answer the challenges of hybrid networking environments. The release focuses on streamlining network operations across cloud, SD-WAN and traditional environments while strengthening security and compliance capabilities.

Key updates in version 7.0 include:

• Enhanced AWS networking visibility, including Direct Connect Transit VIF and Transit Gateway support
• Advanced BGP analytics for detailed route advertisement tracking
• Expanded SD-WAN support for Silverpeak and Viptela environments
• Network digital twin capability through shareable snapshots
• Over 160 new automated intent verification checks
• Multi-view dashboards for role-specific network insights

“As one of our most ambitious updates to date, IP Fabric 7.0 offers wider and more actionable insights into business-critical infrastructure, along with greater ability to embed these insights into critical workflows across the organization,” IP Fabric CEO Pavel Bykov told Network World.

Deeper network visibility across hybrid environments

Among the critical new updates is the platform’s enhanced network discovery capabilities that now provide more granular insights into AWS networking configurations. Network teams can track traffic patterns and routing behaviors across Direct Connect Transit VIF deployments and Transit Gateway implementations with more detail.

Version 7.0’s advanced BGP analytics allow engineers to monitor route advertisements between devices in real-time, helping ensure routing policies are correctly implemented across the network. This level of detail significantly simplifies troubleshooting in complex environments where routing issues can be particularly challenging to diagnose.

For organizations using SD-WAN, the expanded support for Silverpeak and Viptela environments provides enhanced visibility into performance metrics and connectivity status across distributed networks. The platform update also introduces over 160 automated intent verification checks to help verify compliance and security posture.

IP Fabric aims to differentiate against observability tools

The market for network monitoring and observability is highly competitive with no shortage of vendors targeting organizations large and small.

Bykov argued that his firm’s platform should be placed in a different category than observability, CMDB (configuration management database), or monitoring tools. He explained that the IP Fabric platform is able to automatically create a vendor-agnostic, end-to-end view of all network and security devices, states and dependencies, as well as the cloud. Additionally configuration and compliance analytics are able to reveal unknown risks and opportunities, which then can be shared and integrated into other workflows.

With the latest update, Bykov said that users have increased visibility into data sources like cloud inventory tables for AWS and GCP, expanded support for SD-WAN vendors, and access to newly added security data sources like Checkpoint, Stormshield and Palo Alto.

Increased accuracy for network inventory

A core challenge that any networking monitoring platform faces is actually being able to properly identify all the assets in the network inventory. IP Fabric claims that its platform provides more accurate network inventory than other technologies.

Bykov said that on average, IP Fabric is able to reveal 2-10% more devices than what companies have in their CMDB systems. Similarly, he noted that his company tends to also find significant security vulnerabilities such as open ports, firewall misconfigurations, or improperly implemented network segmentation.

Here’s how IP Fabric’s underlying technology approach works:

Discovery: The traditional approach uses probes to discover assets, but that can miss assets if the packet is dropped or filtered. IP Fabric’s CLI-based infrastructure in contrast uses an SSH discovery mechanism.
Network coverage: The system looks for known neighbors with the ability to discover IP-based active network devices, such as switches, routers, firewalls, load-balancers, WAN concentrators, wireless controllers and wireless access-points.
Data collection: The platform collects detailed network state information from every discovered device and computes cross-technology dependencies to create a complete model.

What multi-view dashboards bring to IP Fabric 7.0

Among the standout new features in IP Fabric 7.0 is support for multi-view dashboards.

Bykov emphasized that IP Fabric does not have “per-user” license costs. Instead his company encourages widespread adoption within customer organizations. The new multi-view dashboard supports that effort by enabling different types of users to view data that is impactful for a specific type of user.

For example, a cloud engineer proving segmentation for PCI-DSS compliance can get a view that is specific to that use case. A DevOps professional testing an automation workflow can get a view that provides the right visibility. For business executives that want to check in on their organization’s overall security posture, that can be another view.

“Each of them can accomplish their tasks with a single glance at an IP Fabric dashboard—no coding, scripting, or endless scrolling required,” Bykov said. “This feature is especially useful when it comes to compliance, as different teams often have different priorities and insights into their network.”

How IP Fabric aims to boost automation

A key promise of the IP Fabric platform is that it is able to help reduce manual preparation work and accelerate automation projects.

Bykov explained that IP Fabric helps closed-loop automation projects in three areas; the complete infrastructure and security data model, the outcomes of our configuration compliance analytics, and continuous compliance checks.

IP Fabric is able to identify non-compliant conditions and can kick off workflows with automation platforms such as Ansible. After changes are made, IP Fabric can update systems such as NetBox and ensure that the changes that have been made have not caused any unintended issues.

“With IP Fabric 7.0, we have expanded the data model that will feed automation and AI programs,” Bykov said.

VPN vs. ZTNA: Cisco tackles pros and cons

Tue, 11 Feb 2025 08:00:00 +0000

It’s no secret that more modern approaches to remote access have been usurping VPNs as organizations adapt to the realities of a distributed workforce, cloud-based applications, and heightened security threats. Gartner predicted that by 2025, up to 70% of new remote access deployments will rely on zero trust network access (ZTNA) rather than VPN technology. ZTNA offers a granular, identity-centric model of access control, providing organizations with the flexibility, scalability, and security features needed to support distributed environments.

Stronger authentication methods and encryption protocols are part of the appeal of ZTNA. Unlike VPNs, which grant full network access once connected, ZTNA verifies user identity and only provides access to the specific applications and data needed for their role, minimizing the attack surface. By only granting access to necessary resources, modern solutions can often provide faster connection speeds compared to VPNs, which might need to route all traffic through a single tunnel. Designed to scale, ZTNA can accommodate a larger number of users and devices—without compromising security.

Recently, Cisco hosted a webinar on the topic of modernizing VPNs to ZTNA, sharing both reasons to transition and pitfalls to avoid.

“VPNs originally were soft of a castle-moat kind of concept in which if somebody provides the right credentials they get into our whole network, and that can be very dangerous,” said David Gormley, a product marketing leader for Cisco Cloud Security, during the webinar. “Many companies are starting a zero-trust journey and laying out some requirements that typically include least privilege, and that’s a major part of moving to a more sophisticated remote access program. It’s really access to an individual resource or application instead of a whole network segment.”

Reasons to replace VPN now

Among the many reasons cited during this webinar, the risk of over-privileged access to company resources topped the list of reasons to modernize remote access. The idea that once inside an environment, a bad actor could navigate wherever it desired to perform malicious acts is in the past.

ZTNA limits access to only necessary applications or resources, making it nearly impossible for hackers to conduct lateral attacks once they clear the VPN. ZTNA technologies provide fine-tuned access controls, enabling administrators to define exactly what a user can access on the network based on their role, location, and device. This approach will provide better protection against identity-based attacks and lateral movement by attackers, preventing attackers from moving freely across the network once they gain initial access with compromised credentials.

Performance is another reason enterprises consider transitioning from VPN to ZTNA. With more remote workers and distributed workforces, latency and throughput can become a source of frustration. While VPNs create a broad tunnel to the entire network, ZTNA uses distributed gateways closer to the end users access cloud-based applications. This reduces latency and avoids to need to route all traffic through a single centralized VPN. ZTNA aims to solve for latency and throughput performance problems with remote application access, which are common pain points with legacy VPN technologies.

Another motivation to move from VPN to ZTNA is future-proofing an environment. ZTNA offers more flexibility to scale up or down and supports more devices and locations. Often build on cloud platforms, ZTNA allows for easier scalability and flexibility to accommodate changing user needs and locations. ZTNA can also integrate with other advanced security measures such as multi-factor authentication, threat detection, and encryption. By taking an identity-centric approach to remote access, ZTNA can better position organizations to adapt to evolving security threats and workforce needs over time.

Pitfalls to avoid with modern remote access

Transitioning from VPN to ZTNA isn’t without its challenges, according to this webinar. There are a few pitfalls enterprise organizations should look out for when modernizing their approach to remote access.

To start, be sure that applications can use ZTNA technology for connection. If not, organizations might have to maintain the old VPN product along with the new ZTNA technology. Cisco’s Gormley explained in the webinar that certain types of applications, such as multi-threaded apps or those that rely on server-initiated communication protocols such as RDP or FTP are not well-suited for the ZTNA model.

“It adds to user frustration if they have to maintain their old VPN and they have the new ZTNA. It’s also confusing to the user when to use what,” Gormley said.

Another challenge can arise with too many specialized point products that cover different remote access scenarios. Secure access service edge (SASE) and security service edge (SSE) platforms try to address the complexity of having too many point products to manage, and enterprise organizations should consider a unified approach when moving to ZTNA.

“It’s everything from the contracts with the vendors to the deployment and maintenance and different policy engines. It just ends up being way too complex so you want convergence and that’s where the SSE and SASE approaches come in,” Gormley explained.

Lastly, integration with other security tools and visibility into end-user experience can represent a pitfall when moving from VPN to ZTNA. There are SSE and SASE solutions that include ZTNA and provide a converged approach that is fully integrated, and enterprise organizations should strive for that level of integration. For end-user experience, it is critical to have visibility across tools to understand the source of performance issues when they do arise.

“If you have a distributed workforce, they’re going to connect from a variety of devices and you want to be able to see when there’s a performance issue and what’s causing it,” he said.

By embracing this architectural shift toward ZTNA from VPN, enterprise organizations can better secure their environments, support a distributed workforce, and create a more flexible, scalable remote connectivity infrastructure that will carry them into the future, according to the Cisco webinar.

“Zero trust is not a product, zero trust is an architecture. It’s about identity, access, and response,” said Jack Klecha, senior director for information security at Cisco, during the webinar.

Cisco launches AI Renewals Agent with Mistral AI

Mon, 10 Feb 2025 21:16:23 +0000

Cisco rolled out its first of many AI agents codeveloped with partner Mistral AI, a genAI and large language model (LLM) startup. The so-called AI Renewals Agent is an on-premise, AI-based application that Cisco’s internal Customer Experience (CX) group will use to help retain and renew customers more quickly.

The AI Renewals Agent gathers data from more than 50 different sources of structured and unstructured data and then offers real-time sentiment analysis, intelligent automation, and personalized recommendations tied to customer outcomes and key performance indicators (KPIs). Cisco predicts that the AI Renewal Agent could reduce the time its teams spend building a renewal proposal and preparing for a customer engagement by as much as 20%, and it expects that number will grow as the AI agent improves usage knowledge and more workflows are automated.

Cisco’s CX group provides customer product lifecycle management services and aims to ensure that customers get what they want out of their technology purchases. In addition to making sure customer networks, software and services are working as advertised, a key part of CX is to grow customer retention and bolster hardware and software sales.

Last year, Cisco and Mistral AI said they would collaborate to offer AI agents through Cisco’s Motific platform.

Motific is Cisco’s cloud-based service that gives enterprise customers a centralized hub for managing generative AI elements such as LLMs, security controls, APIs and more. The idea is to help streamline and accelerate the creation, deployment and management of generative AI-based applications for the enterprise, according to Cisco.

“Think of Motific as a one-stop shop where a ton of personas – large language models, ChatGPT etc. – come together with enterprise organization-specific policies and data sources to make it easier to create and deploy generative AI applications in a safe, secure manner,” said Vijoy Pandey, senior vice president of Cisco’s advanced research outfit Outshift, in an interview with Network World when Motific was launched.

FPGAs lose luster in genAI era

Mon, 10 Feb 2025 18:11:15 +0000

In a world gone mad over GPUs for AI acceleration, FPGA processors are seeing limited adoption in the data center for genAI workloads, with their primary use still in embedded markets and edge AI workloads.

There have been some extremely expensive acquisitions related to FPGA — Intel invested $16.7 billion when it bought FPGA maker Altera in 2016, while AMD spent upwards of $35 billion in 2020 to acquire Xilinx, but whether they are getting a return on investment is debatable.

For the fourth quarter, revenue for AMD’s embedded segment (which is where the FPGA business is slotted) came in at $923 million, which is down 13% compared to the year-earlier quarter. And for the full year, segment revenue was $3.6 billion, which is down 33% from the prior year.

Intel didn’t fare much better. For the fourth quarter, Altera delivered revenue of $429 million, which is up 4% sequentially. For Q1, it expects Altera revenue to be down sequentially.

FPGAs, notable because they can be reprogrammed for new processing tasks, seem to have lost their luster in the mania around generative AI. GPUs are all the rage, or in some cases, custom silicon specifically designed for inferencing.

FPGAs certainly have their uses. Both Intel and AMD use their FPGAs for high-end networking cards. Other uses include industrial robotics and factory automation; healthcare for surgical robots and medical diagnostic equipment; and in automotive for ADAS safety systems. But AI is not happening.

“I think AI and genAI helped kind of push away focus from leveraging [FPGA]. And I think there were already moves away from it prior to [the genAI revolution] that put the pedal to the metal in terms of not looking at the FPGAs at the high end. I think now it’s [all about] DeepSeek and is kind of a nice reset moment,” said Alvin Nguyen, senior analyst with Forrester Research.

The DeepSeek effect

One reason DeepSeek AI rattled Wall Street so hard is the Chinese company achieved performance comparable to ChatGPT and Google Gemini but without the billions of dollars’ worth of Nvidia chips. It was done using commercial, consumer grade cards that are considerably cheaper than their data center counterparts.

That means all might not be lost when it comes to FPGA.

“After DeepSeek showing that you could use lower power devices that are more commonly available, [FPGA] might be valuable again,” said Nguyen. But, he adds, “it’s not going to be valuable for all AI workloads like the LLMs, where you need as much memory, as much network bandwidth, as much compute, in terms of GPU as possible.”

Nguyen suggests that DeepSeek shows you don’t necessarily need billions of dollars of cutting-edge Nvidia GPUs, you can get away with an FPGA or a CPU or use consumer-grade GPUs. “I think that’s kind of a nice ‘aha’ moment from an AI perspective, to show there’s a new low bar that’s being set. If you can throw CPUs with a bunch of memory, or, in this case, if you can look at FPGAs and get something very purpose built, you can get a cluster of them at lower cost.”

But Bob O’Donnell, president and chief analyst with TECHpinions, disagrees with the comparison. “FPGAs are used in a whole bunch of different applications, and they’re not really a one-to-one compare against GPUs. They’re kind of a different animal,” he said.

The problem with FPGAs has always been that they’re extraordinarily hard to program and they’re extremely specialized. So there are very few people who really know how to leverage these things. But for the people who do, there’s no replacement, and they’re not typically used for the same kinds of tasks that GPUs are used for, he said.

The jury is still out on whether Intel got its money’s worth. But O’Donnell feels that AMD did, because AMD’s neural processing unit (NPU) for AI acceleration in its CPUs comes from Xilinx technology.

”That was the idea, to take some of the IP that Xilinx had and help integrate it into a PC. In fact, AMD was the first have any kind of NPU. They were way ahead of the game,” O’Donnell said.

“We’ve said countless times that we’re in the beginning stages of this GenAI movement and that AI isn’t a one size fits all concept for compute,” said an AMD spokesperson. “Our FPGA IP power our Ryzen AI NPUs, which are important for GenAI at a consumer perspective. As well, AMD FPGAs and adaptive SoCs are widely deployed for edge AI workloads in embedded markets.”

O’Donnell said it’s still up to for debate whether DeepSeek’s claims of using low-end hardware are actually true. “But it’s fair to say that it raised the conversation of, yes, you can run powerful models on much less hardware than we were told we needed,” O’Donnell said.

Is that an opportunity for FPGA? “I don’t know that it is, though, because the bigger problem with all of this stuff is running is the software that runs on those GPUs,” he said. “It’s all about the software, not the actual chips. And there’s no equivalent that I’m aware of at all that lets you take these big models and run them on FPGAs.”

So can FPGAs fit in to this brave new world of generative AI? The jury is still out

Internet upgrade part of a move towards 400/800G connectivity

Mon, 10 Feb 2025 17:48:07 +0000

The industry transition to 400 Gigabit Ethernet networking took a big step forward this week when the world’s leading Internet exchange operator announced plans to upgrade its New York backbone to 400G. DE-CIX selected Nokia for the project, which will also add 800G readiness to DE-CIX New York, the largest IX in the US northeast region.

The 400G upgrade should be completed by around June, according to DE-CIX CTO Thomas King.

“It will allow us to support more customers,” he says. “Bandwidth is growing 20% per year and we want to make sure we can support that for our customers.”

The upgrade will also allow the company to ensure that there’s a sufficient safety margin when it comes to its capacity.

“We usually run our infrastructure at 65% of maximum at peak times, so even if there’s a surge, we want to make sure we have enough,” he says.

Another driver is the fact that individual data centers themselves are upgrading to 400G Ethernet.

The previous capacity of the DE-CIX network was 100G, which means that data centers running at 400G need to split the signal.

“Now you can take that signal and just carry it through and that’s why there’s an efficiency there,” says Jimmy Yu, an analyst at the Dell’Oro Group.

According to Yu, the install base of 400G is currently growing at 33% a year and most data centers have begun making the shift.

Nearly 80% of switch ports shipped to hyperscalers in 2024 were 200 or 400 Gigabits per second, according to Dell’Oro’s latest data, says Dell’Oro vice president Sameh Boujelbene, with some beginning the transition to 800 Gbps. For the rest of the market, including tier two and three cloud service providers and enterprises, 80% of shipments remained at 100 Gpbs.

“The whole AI thing is driving the 800G upgrade cycle,” says Yu.

The growth of AI — and data traffic in general — is also reflected outside the data centers in global Internet traffic.

According to Yu, Internet traffic has historically been growing at about 30% per year, but that might change soon. Companies are spending money on AI data center clusters, which need to be connected to each other.

“So it should grow higher than 30%,” he says “But we don’t truly know what it will be — there’s a lot of speculation at this point.”

Meanwhile, higher connection speeds always have a positive impact on companies and on customer satisfaction, he adds.

“Moving to 400 Gigabit Ethernet and 400 optical wavelengths is a very positive move for DE-CIX and when they start shifting to 800 that will be even better,” he says.

In January, DE-CIX reported record high data traffic in 2024, reaching 68 exabytes — a 15% increase compared to 2023 and more than double the volume since 2020.

DE-CIX currently connects more than 3,400 networks via its global Internet exchanges.

In addition to adding capacity, DE-CIX is also upgrading the resilience, visibility and security of its networks, says King.

“If there’s a glitch or hiccup — or fiber gets hit by construction — we can easily reroute so that customers don’t experience that hiccup,” he tells Network World. “They don’t even see it. It increases the reliability of the infrastructure.”

A reliable, high-speed connection is a must for many of today’s top applications, such as video conferencing, as well as new technologies coming down the line, like self-driving cars.

DE-CIX currently has a four-nines uptime rate — 99.9999%. “We didn’t have any major outages in the past couple of years,” King says. “Of course, we do maintenance but we try to keep that to a minimum for our customers and they’re informed so they can plan ahead. We know how to run resilient infrastructure.”

With availability already at close to 100%, he doesn’t expect that number to increase. “But we want to assure our customers that we use the latest technology to push that boundary even further and have more redundancy built in if something happens,” he says.

“The services are getting more and more important for enterprises of all sorts,” he adds, “not only digital enterprises but also enterprises in the traditional economy are relying more and more on digital services.”

Better connectivity helps enterprises better serve customers, and also improves internal operations.

“The biggest savings for them is they get fewer support tickets to their IT help desk,” he says. “That’s a big money saver for them.”

DE-CIX is also upgrading security and usability of its systems, he says.

For example, earlier this year, the company added two-factor authentication for customer portals.

These portals offer customers a great deal of control over their connections, including configuration changes.

“It’s powerful, so we want to make sure that only the right persons can log in to change the infrastructure,” he says.

Customers can also use the portals to see how the network is behaving, whether they’re experiencing issues, where traffic is coming from and where it’s going.

“We provide very detailed insights,” King says.

And larger customers who are moving into network automation can access these systems via APIs.

“All the services that we provide customers come with APIs,” King says. “We have customers using that to the extent that there isn’t a human — just a computer interacting with the API calls to interact with the network infrastructure.”

For example, if a company backs up all their data once a month, they can add infrastructure for a short period of time and then remove it when they’re not using it anymore, he says.

DE-CIX customers include not only individual enterprises, but also data center providers and all the major hyperscalers — AWS, Azure, GCP, as well as IBM Cloud and Oracle Cloud — as well as many niche cloud providers. The company has more than 3,000 customers globally, he says.

DE-CIX is a neutral exchange, meaning that it’s not owned by any particular data center provider or carrier, meaning that it can connect different data centers from different operators.

According to an October report by Dstream Group, conducted on behalf of DE-CIX, neutral exchanges now represent 80% of US exchanges, a sign that enterprises increasingly value flexibility in their network architectures.