Hybrid cloud demands new tools for performance monitoring

Platforms that draw performance data from multiple devices on multiple networks will make for better hybrid-cloud performance and troubleshooting.

spot nww primary hybridcloud 2400x1600 6

Credit: Frame Stock Footage / Shutterstock

Network performance monitoring has become more complex now that companies have more workloads in the cloud, and network teams are finding visibility into the cloud isn’t on par with what they have into their on-prem resources.

Migration to the cloud introduced infrastructure that isn’t owned by the organization, and a pandemic-driven surge in remote work is accelerating the shift to the cloud and an associated increase in off-premises environments. Container-based applications deployed on cloud-native architectures further complicate network visibility. For these reasons and more, enterprises need tools that can monitor not only the data center and WAN but also the internet, SaaS applications and multiple providers’ public cloud operations.

“Only 36% of network operations professionals believe that their network management tools are as good at managing cloud networks as they are at managing on-prem networks,” says Shamus McGillicuddy, a vice president of research at Enterprise Management Associates (EMA). “At the same time, the average enterprise can attribute about 40% of its network traffic to the cloud at this point. So that’s a huge disadvantage.”

How did these visibility gaps arise? Network teams were often on the sidelines as enterprises began deploying workloads to the cloud.

“One of the problems is that the network infrastructure team doesn’t always have the same authority over the cloud environment as it does over the on-prem network,” McGillicuddy says. “A lot of times the cloud adoption was led by an application team or a line of business, and they looked at the cloud as an alternative to IT, not necessarily an extension of it,” he says.

“The teams that do have more authority in the cloud don’t always think it’s important to have network monitoring. They’re more interested in application performance monitoring,” McGillicuddy says. “They don’t see the point in devoting their budget to stuff that they consider to be like old-world infrastructure monitoring.”

How companies view the role of network engineers in the cloud makes all the difference, says Dan Rohan, product manager at network visibility and performance management vendor Kentik.

“When we started talking about monitoring the cloud two or three years ago, I don’t think very many network engineers cared,” Rohan says. As cloud deployments started to mature, and companies took a hard look at cloud costs, performance, and controls, they realized they needed to put some structure back into place, Rohan says, “and then suddenly, network engineers had a role to play again.”

What today’s network performance management tools can do

Typical cloud vendor networks are incredibly complex. “It’s not uncommon today to have 15 hops between you and the cloud provider across your ISP, maybe a local carrier, and then maybe a Tier 1 carrier. And then you’ll go through another 30 hops inside the cloud provider,” says Matt Stevens, president and CEO of AppNeta. “So the days of 10 to 20 hops total have now exploded to 40 or 50 Layer 3 network hops. Each one does its own thing to your performance.”

As network complexity goes up, so does the potential for problems, Stevens says. “When you have multiple employees running multiple applications, and those applications are hosted from multiple sources, whether it be your private data center, a virtual data center that your organization is trying to run as a cloud, a fully public cloud, or something in between—the very definition of hybrid IT—every time you add one more variable, the complexity goes up [exponentially].”

Network teams are turning to vendors for help. According to EMA, 57% of network teams have acquired specialized tools to close gaps in cloud-networking visibility. The research firm expects network-performance management tools to provide cloud monitoring through some combination of:

Collecting metrics from virtual network elements deployed in the cloud
Collecting flow logs and other telemetry offered by cloud providers
Collecting network traffic data in the cloud, such as packet flows
Analyzing synthetic traffic directed at SaaS services

Traditional network-management tools were designed to monitor the health of routers and switches in a data center or an on-prem network, but the cloud poses different challenges, Rohan says. “Network engineers don’t have a picture of [the cloud infrastructure] in their head because it’s growing fast, and it wasn’t built by them, and it’s changing all the time, because it’s the cloud. So they’re starting off with that kind of handicap,” he says.

They need different tools to solve the problems that crop up—an application team that can’t get its new cloud application to talk to an on-prem database, or integrate with another cloud app, for instance.

“Network teams would turn to these tools that were just pulling data out of AWS’s API, or any one of the cloud provider’s APIs. But that doesn’t tell me about connectivity failures. It doesn’t tell me why things aren’t working. And so we started there,” Rohan says. “We think the thing that really helps network people in the cloud today is helping to answer those connectivity questions across complex topologies.”

Kentik’s tool can provide network engineers with a picture of the current network, “the thing that they inherited,” Rohan says. “That helps them visualize the flows – the good and the bad. And they can say, ‘Okay, if we install a transit gateway here, and a peering connection here…’ and use their network skills, and they can actually use our tool to wrest control of their networks.”

Network metrics for cloud visibility

Telemetry data that can reveal the state of hybrid-cloud networks comes from all kinds of networks—data center, WAN, internet, cloud, mobile, edge—and from all types of network elements, including physical and virtual appliances, and dedicated or cloud-native devices.

Data is pulled from data-center components, cloud infrastructure (such as service meshes, transit and ingress gateways), internet infrastructure, campus devices, traditional WAN routers and switches, SD-WAN gateways, and IoT endpoints, to name a few. Telemetry types can include flow data exported from network devices (flow collection standards such as NetFlow, J-Flow, sFlow, the IETF’s IPFIX); cloud providers’ virtual private cloud flow logs; SNMP-based device telemetry; and event notifications sent via syslog or SNMP trap.

Along with passive monitoring data, such as network flows and packets, network teams are increasingly turning to active monitoring techniques, such as basic ping test and Layer 7 synthetic monitoring, to augment traditional infrastructure and traffic monitoring metrics, according to EMA. The research firm finds that 21% of network teams are using synthetic traffic tools for sustained network-availability and performance monitoring.

It’s not that enterprises haven’t monitored these networks and devices before; rather, the goal is to provide coordinated monitoring across a variety of networks, a unified view of the results, and the ability to integrate analytic findings with automated workflows. Tooling is going beyond the core infrastructure monitoring to provide more application-level views and insight into the application performance that end users are experiencing.

Who’s selling network-performance management tools?

The product landscape for network performance management is crowded. Vendors include Accedian, AppNeta, Cisco-ThousandEyes, cPacket Networks, Kentik, LogicMonitor, ManageEngine, Riverbed and SolarWinds. There’s no one vendor that covers all the bases, and many of the tools are complementary rather than competitive—a typical IT organization uses between four and 10 tools to monitor and troubleshoot its network, EMA finds.

Research firm Gartner, in its Market Guide for Network Performance Monitoring, says tools that are ideal for on-premises environments become less effective as organizations become increasingly hybrid. While some vendors can provide visibility across both on-premises and cloud environments, that is challenging due to data-transport requirements and differing networks, which can’t always be viewed through the same lens, Gartner says.

Among its recommendations for enterprises seeking network performance-management tools, Gartner recomments that companies “resist the desire to use the same monitoring approach in the cloud as your on-premises environment, especially when it comes to packet capture and analysis. Focus on vendors that provide support for cloud-native functions, such as APIs or true network-ﬂow data.”

Adding AI to network troubleshooting

There’s no lack of telemetry data to analyze. What distinguishes modern network monitoring tools is their ability to measure performance and put the findings in a context that answers the questions network teams are being asked.

“This move to hybrid cloud, it’s not really about ‘is it working or not working? Is it up or is it down?’ It’s the idea that ‘slow’ is the new ‘down,’” says AppNeta’s Stevens. Users aren’t calling to say they can’t connect to Salesforce, for example. They’re complaining that a script in Salesforce is running slow and impacting their ability to do their job, he says.

“Regardless of the architecture being deployed, we’re going to give the business the visibility to understand, ‘Here’s the performance I need. Here’s the performance I’m getting. Is that gap so big that I need to take action, or can I set it to the side and go work on another problem?’” he says.

That’s where artificial intelligence comes in play. Tools increasingly support AI-based diagnostics that are designed to find patterns in network data and draw conclusions from them based on historical anomaly detection and root-cause analysis.

“We don’t just tell you there’s a problem, we tell you where it is. We tell you why it is, we give a remediation suggestion, and we also give you a confidence score” that quantifies how likely it is that the proposed remediation will work, Stevens says.

Having tooling that can give network teams the confidence to understand the issues and prioritize remediation gives IT credibility at a time when companies are undertaking major business transformation projects, Stevens says. “These are big projects that touch a lot of people, and IT is being asked to be a business partner.”

Scott Bulger, a senior systems network engineer who spent more than 30 years of his career working with network vendors and corporate IT networks, has spent the last three years working with AppNeta’s technology at two large enterprises.

“Visibility into the cloud infrastructure is minimal, and so the ability to track end-to-end packet loss, jitter and latency, into the service provider cloud and back, gives you the autonomy and the validity to say to the cloud provider, ‘we have packet loss.’ You have hard, substantial evidence, and it’s irrefutable,” Bulger says.

For Bulger, the metric he’s most concerned about is packet loss. While TCP/IP-based networks were designed to accommodate loss, “there’s a point—above 4% or 5%, depending on your topology—at which loss starts to become noticeable and impactful to the end users. So some loss is acceptable, but a significant loss, or loss for extended periods of time, is impactful,” he says.

In the big picture, network visibility tools can not only help identify problems but also help avert performance problems altogether. “These platforms give you visibility to problems before they impact your customer,” Bulger says.

However, moving from a reactive to proactive posture isn’t easy. “If your DevOps or help-desk model is saturated supporting immediate problems, you don’t get much bandwidth for people saying, ‘here’s something that’s a little bit broken, but it could be a lot broken if we don’t do something about it,’” Bulger says.

“We need a culture that prioritizes proactive remediation,” he says. “The managers who get it are completely on board and never hesitate to fund it.”

Americas

Topics

About

Policies

Our Network

More

Hybrid cloud demands new tools for performance monitoring

Platforms that draw performance data from multiple devices on multiple networks will make for better hybrid-cloud performance and troubleshooting.

What today’s network performance management tools can do

Network metrics for cloud visibility

Who’s selling network-performance management tools?

Adding AI to network troubleshooting

More from this author

2025 global network outage report and internet health check

2024 global network outage report and internet health check

El Capitan bumps Frontier to claim world’s fastest supercomputer title

Cisco snaps up AI security player Robust Intelligence

2023 global network outage report and internet health check

2020-2022 global network outage report and internet health check

10 things to know about data-center outages

Data center fires raise concerns about lithium-ion batteries

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Arm secures Meta as first customer in chip push, challenging industry giants

Juniper unveils EX4000 access switches to simplify network operations

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command

Hybrid cloud demands new tools for performance monitoring

Platforms that draw performance data from multiple devices on multiple networks will make for better hybrid-cloud performance and troubleshooting.

What today’s network performance management tools can do

Network metrics for cloud visibility

Who’s selling network-performance management tools?

Adding AI to network troubleshooting

From our editors straight to your inbox

More from this author

2025 global network outage report and internet health check

2024 global network outage report and internet health check

El Capitan bumps Frontier to claim world’s fastest supercomputer title

Cisco snaps up AI security player Robust Intelligence

2023 global network outage report and internet health check

2020-2022 global network outage report and internet health check

10 things to know about data-center outages

Data center fires raise concerns about lithium-ion batteries

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Arm secures Meta as first customer in chip push, challenging industry giants

Juniper unveils EX4000 access switches to simplify network operations

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command