Network World From the data center to the edge Thu, 13 Jul 2023 20:48:06 +0000 http://backend.userland.com/rss092 Copyright (c) 2024 IDG Communications, Inc. en-US Compressing files using the zip command on Linux Mon, 13 May 2024 16:14:13 +0000

Zipping files allows you to save a compressed version of a file that might serve as a backup of the original. It also allows you to group a collection of related files into a similarly reduced size file for safekeeping.

Zipping a single file

If we want to zip a single file, you could use a command like the second of the two commands shown below. The file to be zipped (tips.html) is shown in the first command.

$ ls -l tips.html
-rw-r--r--. 1 shs shs 79873 Sep 19  2023 tips.html
$ zip tips tips.html
  adding: tips.html (deflated 73%)

Notice that the file is deflated by 73%. List it again and you’ll see how much smaller it is than the original file which is still on the system. Note that the file extension “zip” will be added automatically if you don’t include it as the extension for your first argument as in the “zip tips” command above.

$ ls -l tips.*
-rw-r--r--. 1 shs shs 21713 May  7 10:19 tips.zip
-rw-r--r--. 1 shs shs 79873 Sep 19  2023 tips.html

The zip command does not remove the original file remains on the system.

Zipping a series of files

You can zip a group of files into a single zip file as a way to back them up in a compressed format. Note that the compression ratio for each file is displayed in the process.

$ zip bin bin/*
  adding: bin/FindFiles (deflated 54%)
  adding: bin/shapes (deflated 63%)
  adding: bin/shapes2 (deflated 62%)
  adding: bin/shapes3 (deflated 45%)
$ ls -l bin.zip
-rw-r--r--. 1 shs shs 1765 May  7 10:56 bin.zip

Use the -q argument if you prefer to not see the details listed for the files as they are added to the zip file.

$ zip -q bin bin/*
$

If you are zipping a directory that contains subdirectories, those subdirectories, but not their contents will be added to the zip file unless you add the -r (recursive). Here’s an example:

$ zip bin bin/*
updating: bin/FindFiles (deflated 54%)
updating: bin/shapes (deflated 63%)
updating: bin/shapes2 (deflated 62%)
updating: bin/shapes3 (deflated 45%)
updating: bin/NOTES/ (stored 0%)

Here’s an example that adds -r and, as a result, includes the NOTES subdirectory’s files in the bin.zip file is it creating.

$ zip -r bin bin/*
updating: bin/FindFiles (deflated 54%)
updating: bin/shapes (deflated 63%)
updating: bin/shapes2 (deflated 62%)
updating: bin/shapes3 (deflated 45%)
updating: bin/NOTES/ (stored 0%)
  adding: bin/NOTES/finding_files (deflated 5%)
  adding: bin/NOTES/shapes_scripts (deflated 35%)

Using encryption passwords

To add a password that will need to be used to extract the contents of a zip file, use a command like the one shown below. Notice that it prompts twice for the password, though it does not display it.

$ zip -e -r bin bin/*
Enter password:
Verify password:
updating: bin/FindFiles (deflated 54%)
updating: bin/shapes (deflated 63%)
updating: bin/shapes2 (deflated 62%)
updating: bin/shapes3 (deflated 45%)
updating: bin/NOTES/ (stored 0%)
updating: bin/NOTES/finding_files (deflated 5%)
updating: bin/NOTES/shapes_scripts (deflated 35%)

Extracting file from a zip file

To extract the contents of a zip file, you would use the unzip command. Notice that, because the zip file below was encrypted with a password, that password needs to be supplied to extract the contents.

$ unzip bin.zip
Archive:  bin.zip
[bin.zip] bin/FindFiles password:
  inflating: bin/FindFiles
  inflating: bin/shapes
  inflating: bin/shapes2
  inflating: bin/shapes3
   creating: bin/NOTES/
  inflating: bin/NOTES/finding_files
  inflating: bin/NOTES/shapes_scripts

If you want to extract the contents of a zip file to a different directory, you don’t need to cd to that directory first. Instead, you can simply add the -d option followed by the target directory to specify the new location.

$ unzip bin.zip -d /tmp
Archive:  bin.zip
[bin.zip] bin/FindFiles password:
  inflating: /tmp/bin/FindFiles
  inflating: /tmp/bin/shapes
  inflating: /tmp/bin/shapes2
  inflating: /tmp/bin/shapes3
   creating: /tmp/bin/NOTES/
  inflating: /tmp/bin/NOTES/finding_files
  inflating: /tmp/bin/NOTES/shapes_scripts

You can extract a single file from a zip file if you specify its name as listed in the zip file. Here’s an example command where the original file (maybe it’s been damaged in some way) is replaced after confirming that this is what you want.

$ unzip bin.zip 'bin/shapes3'
Archive:  bin.zip
replace bin/shapes3? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: bin/shapes3

Wrap-up

Zipping files to preserve them, back them up, extract them and require passwords for extraction are all important things to know when dealing with the zip command.

Linux
]]>
https://www.networkworld.com/article/2104534/compressing-files-using-the-zip-command-on-linux.html 2104534
High-bandwidth memory nearly sold out until 2026 Mon, 13 May 2024 15:35:22 +0000

South Korean memory manufacturer SK Hynix has announced that its supply of high-bandwidth memory (HBM) has been sold out for 2024 and for most of 2025. Basically, this means that demand for HBM exceeds supply for at least a year, and any orders placed now won’t be filled until 2026.

The news comes after similar comments were made in March by the CEO of Micron, who said that the company’s HBM production had been sold out through late 2025.

HBM memory is used in GPUs to provide extremely fast memory access, much faster than standard DRAM. It is key to the performance of AI processing. No HBM, no GPU cards.

Bottom line: Expect a new supply-chain headache thanks to HBM being unavailable until at least 2026. It doesn’t matter how many GPUs TSMC and Intel make – those cards are going nowhere without memory.

Hynix is the leader in the HBM space with about 49% market share, according to TrendForce. Micron’s presence is more meager, at about 4% to 6%. The rest is primarily supplied by Samsung, which has not made any statement as to availability. But chances are, HBM demand has consumed everything Samsung can make as well.

HBM memory is more expensive to make, more difficult to make, and takes longer to make than standard DRAM. Such fabrication plants, like a CPU fab, take time, and the three HBM makers couldn’t keep up with the explosive demand.

While it is easy to blame Nvidia for this shortage, it’s not alone in driving high-performance computing and the memory needed to go with it. AMD is making a run, Intel is trying, and many major cloud service providers are building their own processors. This includes Amazon, Facebook, Google, and Microsoft. All of them are making their own custom silicon, and all need HBM memory.

That leaves the smaller players on the outside looking in, says Jim Handy, principle analyst with Objective Analysis. “It’s a much bigger challenge for the smaller companies. In chip shortages the suppliers usually satisfy their biggest customers’ orders and send their regrets to the smaller companies. This would include companies like Sambanova, a start-up with an HBM-based AI processor,” he said.

DRAM fabs can be rapidly shifted from one product to another, as long as all products use the exact same process. This means that they can move easily from DDR4 to DDR5, or from DDR to LPDDR or GDDR used on graphics cards. 

That’s not the case with HBM, because only HBM uses a complex and highly technical manufacturing process called through-silicon vias (TSV) that is not used anywhere else. Also, the wafers need to be modified in a manner different from standard DRAM, and that can make shifting their manufacturing priorities very difficult, said Handy.

So if you recently placed an order for an HPC GPU, you may have to wait. Up to 18 months.

CPUs and Processors, Data Center, High-Performance Computing
]]>
https://www.networkworld.com/article/2104516/high-bandwidth-memory-nearly-sold-out-until-2026.html 2104516
NSA, FBI warn of email spoofing threat Mon, 13 May 2024 15:01:41 +0000

Spoofed email – email that appears to come from a legitimate source but is not – is becoming an increasingly worrisome threat. It’s so serious that the NSA and FBI have joined forces in releasing the following warning about spoofed email from senders in North Korea:

“The National Security Agency (NSA) joins the Federal Bureau of Investigation (FBI) and the U.S. Department of State in releasing the Cybersecurity Advisory (CSA) ‘North Korean Actors Exploit Weak DMARC Security Policies to Mask Spearphishing to protect against Democratic People’s Republic of Korea (DPRK, aka North Korea) techniques that allow emails to appear to be from legitimate journalists, academics, or other experts in East Asian affairs.”

To fully grasp what is happening, read this explanation from Al Iverson, industry research and community engagement lead for Valimail, which provides email authentication and anti-impersonation software:

“North Korea found a way to exploit something that security and deliverability experts have been worried about over these past few months; there’s a whole bunch of domain owners out there who are not necessarily security savvy, and perhaps focused more on email marketing efforts. Those domain owners (and there are more than a million of them out there) were quick to implement a bare minimum DMARC policy to comply with new mailbox provider sender requirements. What they didn’t realize is that this can leave the domain unprotected against phishing and spoofing.

People must protect their domain by fully implementing DMARC properly to ensure that bad guys find no phishing or spoofing success when they work their way down the list of domains … to yours.

The NSA, the FBI and the U.S. Department of State have identified this as an issue already, and Valimail is fully aligned with the advisory… they issued at the end of the week.”

DMARC stands for “Domain-based Message Authentication, Reporting and Conformance.” It’s an email authentication protocol designed to give email domain owners the ability to protect their domain from unauthorized use. In other words, it tries to prevent email spoofing. It controls what happens when a message fails authentication tests. When this happens, the receiving server is unable to verify that the message’s sender is who they claim to be.

Iverson also pointed out the following:

  • North Korean cyber actors are actively searching for and exploiting domains with weak DMARC policies.
  • Even the largest companies in the hospitality, retail, education, financial sectors, and more, which we often assume to be secure, are at risk due to weak DMARC policies.
  • Bad actors can just take the list of most popular companies and work their way down to see who is spoofable.
  • An improperly configured DMARC policy is just as bad (just as insecure) as not having DMARC in place at all.
  • Are you protected? Don’t assume that you’re not a worthy target; just because you haven’t been attacked today, doesn’t mean you won’t be spoofed or phished tomorrow.
  • Valimail data shows more than 1.3 million domains currently publish a “p=none” DMARC policy!

You can find out more about DMARC here.

Linux
]]>
https://www.networkworld.com/article/2104470/nsa-fbi-warn-of-email-spoofing-threat.html 2104470
Download our SASE and SSE enterprise buyer’s guide Mon, 13 May 2024 15:00:00 +0000

These two related technologies — Secure Access Service Edge (SASE) and Secure Service Edge (SSE) — address a new set of challenges that enterprise IT faces as employees shifted to remote work and applications migrated to the cloud.

Enterprise Buyer’s Guides, Network Security, Remote Access Security, SASE
]]>
https://us.resources.networkworld.com/resources/download-our-sase-and-sse-enterprise-buyers-guide/ 2098238
Frontier retains top spot among world’s fastest supercomputers Mon, 13 May 2024 09:00:00 +0000

Frontier held onto its No. 1 ranking in the 63rd edition of the TOP500, but with the second place Aurora system breaking the exascale machine barrier, the end of Frontier’s reign could be in sight.

The Frontier system at Oak Ridge National Laboratory (ORNL) in Tenn., maintained its leading position with an HPL score of 1.206 EFlop/s. With a total of 8,699,904 combined CPU and GPU cores, the Frontier system has an HPE Cray EX architecture that combines 3rd Gen AMD EPYC CPUs optimized for HPC and AI with AMB Instinct MI250X accelerators. The system relies on Cray’s Slingshot 11 network for data transfer, and the machine has a power efficiency rating of 52.93 GFlops/Watt –which also puts Frontier at the No. 13 spot on the GREEN500.

Staying in line with the last list, the Aurora system at the Argonne Leadership Computing Facility in Ill. is ranked second on the TOP500. Aurora is also now the second machine to break the exascale barrier with an HPL score of 1.012 EFlop/s – an improvement over the 585.34 PFlop/s score for the last edition. The Aurora system is based on HPE Cray EX- Intel Exascale Computer Blade and uses Intel Xeon CPU Max series processors, Intel Data Center GPU Max Series accelerators, and a Slingshot-11 interconnect. TOP500 notes that Aurora achieved this rank while currently being decommissioned and not fully complete.

The systems rounding out the top five also remained consistent. A system called Eagle disrupted the list in late 2023, but it maintained its No. 3 position in this ranking. Installed on the Microsoft Azure Cloud in the U.S. The Eagle system continues to be the highest-ranking cloud system on the TOP500. The Eagle’s Microsoft NDv5 system has an HPL score of 561.2 PFlop/s and is based on Intel Xeon Platinum 8480C processors and Nvidia H100 accelerators.

The Supercomputer Fugaku and LUMI systems also retained their No. 4 and No. 5 positions, respectively. Based in Kobe, Japan, Fugaku has an HPL score of 442 PFlop/s and continues to be the highest-ranked system outside of the U.S. The LUMI system at EuroHPC/CSC in Finland – the largest system in Europe – also stayed put in its No. 5 spot with an HPL core of 380 PFlop/s.

One newcomer to the list, the Alps machine from the Swiss National Supercomputing Centre (CSCS) in Switzerland, achieved an HPL score of 270 PFlop/s, landing at No. 6 in this edition’s ranking. Sierra, the system installed on at the Lawrence Livermore National Laboratory in Calif., fell off the list this time, previously ranked No. 10 in November 2023.

Here is a breakdown of specific details for the 10 overall fastest supercomputer systems on the TOP500 list for May 2024:

#1: Frontier

This HPE Cray EX system is the first U.S. system with a performance exceeding one Exaflop/s. It is installed at the ORNL in Tenn., where it is operated for the Department of Energy (DOE). 

  • Cores: 8,699,904
  • Rmax (PFLOPS): 1,206.00
  • Rpeak (PFLOPS): 1,714.81
  • Power (kW): 22,786

#2: Aurora

The Aurora system is installed at the Argonne Leadership Computing Facility, Illinois, USA, where it is also operated for the DOE and holds a preliminary HPL score of 1.012 Exaflop/s. 

  • Cores: 9,264,128
  • Rmax (PFLOPS): 1,012.00
  • Rpeak (PFLOPS): 1,980.01
  • Power (kW): 38,698

#3: Eagle

The No. 3 system is installed by Microsoft in its Azure cloud. This Microsoft NDv5 system is based on Xeon Platinum 8480C processors and Nvidia H100 accelerators and achieved an HPL score of 561 Pflop/s.

  • Cores: 2,073,600
  • Rmax (PFLOPS): 561.20
  • Rpeak (PFLOPS): 846.84

#4: Supercomputer Fugaku

This system is installed at the RIKEN Center for Computational Science (R-CCS) in Kobe, Japan. It has 7,630,848 cores which allowed it to achieve an HPL benchmark score of 442 Pflop/s.

  • Cores: 7,630,848
  • Rmax (PFLOPS): 442.01
  • Rpeak (PFLOPS): 537.21
  • Power (kW): 29,899

#5: LUMI

Located in CSC’s data center in Kajaani, Finland, the European High-Performance Computing Joint Undertaking (EuroHPC JU) is pooling European resources to develop top-of-the-range Exascale supercomputers for processing big data.

  • Cores: 2,752,704
  • Rmax (PFLOPS): 379.70
  • Rpeak (PFLOPS): 531.51
  • Power (kW): 7,107

#6: Alps (new to the list)

This system is an HPE Cray EX254n system with Nvidia Grace 72C and Nvidia GH200 Superchip and a Slingshot-11 interconnect, achieving 270 PFlop/s.

  • Cores: 1, 305,600
  • Rmax (PFLOPS): 270.00
  • Rpeak (PFLOPS): 353.75
  • Power (kW): 5,194

#7: Leonardo (previously #6)

The Leonardo system is installed at another EuroHPC site in CINECA, Italy. It is an Atos BullSequana XH2000 system with Xeon Platinum 8358 32C 2.6GHz as main processors, NVIDIA A100 SXM4 40 GB as accelerators, and Quad-rail NVIDIA HDR100 Infiniband as interconnect.

  • Cores: 1,824,768
  • Rmax (PFLOPS): 241.20
  • Rpeak (PFLOPS): 306.31
  • Power (kW): 7,494

#8: MareNostrum 5 ACC

The MareNostrum 5 ACC system was remeasured and jumped in the ranking over the Summit system. It is now at No. 8 and installed at the EuroHPC/Barcelona Supercomputing Center in Spain.

  • Cores: 663,040
  • Rmax (PFLOPS): 175.30
  • Rpeak (PFLOPS): 249.44
  • Power (kW): 4,159

#9: Summit (previously #7)

Housed at the ORNL in Tenn., the IBM-built Summit system has 4,356 nodes, each one housing two POWER9 CPUs with 22 cores each and six NVIDIA Tesla V100 GPUs each with 80 streaming multiprocessors (SM). 

  • Cores: 2,414,592
  • Rmax (PFLOPS): 148.60
  • Rpeak (PFLOPS): 200.79
  • Power (kW): 10,096

#10: Eos NVIDIA DGX SuperPOD (previously #9)

This system is based on the NVIDIA DGX H100 with Xeon Platinum 8480C processors, NVIDIA H100 accelerators, and Infiniband NDR400 and it achieves 121.4 PFlop/s.

  • Cores: 485,888
  • Rmax (PFLOPS): 121.40
  • Rpeak (PFLOPS): 188.65
CPUs and Processors, Data Center, Supercomputers
]]>
https://www.networkworld.com/article/2100462/frontier-retains-top-spot-among-worlds-fastest-supercomputers.html 2100462
Nvidia teases quantum accelerated supercomputers Mon, 13 May 2024 06:30:00 +0000

At ISC High Performance 2024 in Hamburg, Germany, Nvidia today announced that nine new supercomputers worldwide are using its Grace Hopper Superchips to deliver a combined 200 exaflops (200 quintillion calculations per second) of computing power with, it said, twice the energy efficiency of an x86 system plus GPU.

Grace Hopper accounts for 80% of Hopper sales, said Dion Harris, Nvidia’s director, accelerated data center GTM, during a media briefing. “The reason why that’s exciting is that it leverages this novel sort of architecture of this tightly coupled CPU and GPU architecture to deliver great performance for HPC and AI.”

The first European Grace Hopper supercomputer to come online is Alps at the Swiss National Supercomputing Centre, which was built by Hewlett Packard Enterprise (HPE) and offers 20 exaflops of AI computing driven by 10,000 Grace Hopper superchips. Its role is to advance weather and climate modeling, and material science.

Nvidia also announced that national supercomputing centers worldwide will soon receive a performance boost via the open-source Nvidia CUDA-Q platform. The company revealed that sites in Germany, Japan, and Poland will use the platform to power quantum processing units (QPU) in their high performance computing systems.

“Quantum accelerated supercomputing, in which quantum processors are integrated into accelerated supercomputers, represents a tremendous opportunity to solve scientific challenges that may otherwise be out of reach,” said Tim Costa, director, Quantum and HPC at Nvidia. “But there are a number of challenges between us, today, and useful quantum accelerated supercomputing. Today’s qubits are noisy and error prone. Integration with HPC systems remains unaddressed. Error correction algorithms and infrastructure need to be developed. And algorithms with exponential speed up actually need to be invented, among many other challenges.”

To address these issues, he said, more than 25 national quantum initiatives have been launched. There are more than 350 quantum startups, over 70% of the Fortune 500 have some sort of quantum program, and more than 48,000 quantum research papers have been published.

“But another open frontier in quantum remains,” Costa said. “And that’s the deployment of quantum accelerated supercomputers – accelerated supercomputers that integrate a quantum processor to perform certain tasks that are best suited to quantum in collaboration with and supported by AI supercomputing. We’re really excited to announce today the world’s first quantum accelerated supercomputers.”

These machines will be at AIFST in Japan, Jülich in Germany, and PSNC in Poland (which has installed two QPUs).

“The integration of not one but four quantum processing units with three supercomputers opens the door to the next wave of quantum innovation,” said Heather West, research manager, quantum computing, infrastructure systems, platforms, and technology group, at IDC. “Researchers have always expected that quantum computing would accelerate scientific advantage. However, the symbiotic relationship between quantum-classical compute technologies will also help to accelerate the development of quantum systems themselves, paving the way for useful, error-corrected, quantum-centric supercomputers and the era of quantum utility, a long awaited destination for both quantum researchers and quantum end users.”

However, said Harris, this will need the application of AI models to succeed. “We  don’t think that there will be a successfully deployed fault-tolerant system that doesn’t use AI models to do large scale, real-time error correction to calibrate these devices. Right now, it’s an incredibly human-time-intensive task for physicists to calibrate and keep up a quantum device. And it’s only going to get harder and harder as the number of qubits goes up. And so we have to automate that and apply the best technology and AI in order to do those tasks.”

CPUs and Processors, Data Center, Supercomputers
]]>
https://www.networkworld.com/article/2102436/nvidia-teases-quantum-accelerated-supercomputers.html 2102436
Cisco adds AI features to AppDynamics On-Premises Fri, 10 May 2024 18:28:37 +0000

Cisco has added AI features to its AppDynamics observability platform that promise to help customers more quickly detect anomalies, identify performance problems, and resolve issues across the enterprise. The new features will be implemented in a virtual appliance, available this month, for Cisco AppDynamics On-Premises, which gives customers the ability to see and manage their entire application stack, including application code, runtime, infrastructure (servers, databases, networks, VMs, containers), and user experience.

AppDynamics On-Premises includes integration with Cisco’s Secure Application package, which can monitor for application vulnerabilities and threats across services, workloads, pods, containers, and business transactions. This allows for real-time identification and blocking of attacks, according to Cisco.

The new virtual appliance includes an AI-based detection and remediation capability that learns and detects anomalies and can determine root causes in application performance issues, wrote Aaron Schifman, senior technical product marketing manager at Cisco AppDynamics, in a blog about the news. The package combines threat detection, threat intelligence and business impact to create a composite risk score that identifies which threats must be addressed first based on likely business impact, Schifman stated.

AppDynamics On-Premises also works with the recently released Smart Agent for Cisco AppDynamics, which can help customers spot and update out-of-date software agents as well as on-board and manage new agents through a centralized user interface, Cisco stated.

Agents are key to tracking application status, security and performance monitoring, but as applications become widely distributed via multiclouds, branch offices and private locations, the task of handling agents can become complex and tedious, Cisco stated.

“Customers can now use this virtual appliance together with our Smart Agent capability to deploy new innovations faster and simplify lifecycle operations,” said Ronak Desai, senior vice president and general manager of Cisco AppDynamics and Full-Stack Observability, in a statement.

The new on-premises deployment is packaged with all necessary services for deployment in a single VMware vSphere Open Virtual Format (OVA), and support for other virtualization platforms, such as AMI and VHD, is coming soon, Schifman stated.

In other developments, Cisco said AppDynamics On-Premises can now be hosted on Amazon Web Services (AWS) and Microsoft Azure.

“In addition to on-premises deployments, customers can manage their own observability deployments in AWS or Microsoft Azure by using the Amazon Machine Instance (AMI) or Virtual Hard Disk (VHD) images of the virtual appliance,” Schifman stated. “This is valuable when a SaaS instance is not available in the country where a sensitive workload needs to be monitored, or when a customer wants to retain full control of the observability solution.”

In addition to the new virtual offering, AppDynamics added full-stack observability for on-premises SAP and non-SAP environments, which promises to let customer address performance issues within SAP deployments before they impact the business.

“Cisco brings resiliency into the SAP landscape with application performance, augmented by AI-powered intelligence for the Java stack, enabling SAP developers and BASIS admins to ensure service availability, align performance with SAP business outcomes, and discover SAP related security vulnerabilities to mitigate risk,” Schifman stated.

The AppDynamics platform can now correlate metrics across SAP and non-SAP environments as well as monitor SAP systems and processes with over 30 pre-built dashboards. Customers can also build their own customized dashboards with a dashboard generator, Cisco stated. 

“Correlate real-time visibility of ABAP, SAP’s proprietary language, down to the code level, with the broader landscape stack to understand how performance impacts the business and revenue streams,” Cisco stated.

Cisco and SAP have had a long-standing strategic partnership offering all manner of collaboration to support and manage hybrid cloud environments.

Network Management Software, Network Monitoring
]]>
https://www.networkworld.com/article/2099747/cisco-adds-ai-features-to-appdynamics-on-premises.html 2099747
CHIPS Act to fund $285 million for semiconductor digital twins Fri, 10 May 2024 14:53:43 +0000

The Biden administration has proposed adding another $285 million to the CHIPS and Science Act for funding semiconductor development in the U.S. with the creation of a chip manufacturing institute and support for digital twins.

The CHIPS for America Program is proposing a first-of-its-kind institute focused on the development, validation, and use of digital twins for semiconductor manufacturing, advanced packaging, assembly, and test processes.

The CHIPS Manufacturing USA institute aims to establish regional networks to share resources with companies developing and manufacturing both physical semiconductors and digital twins.

Digital twins are virtual representations of physical chips that mimic how the real version will function. It is meant to be a faster way to develop, test, and revise chips without having to make physical versions of them. It’s much easier to simulate a chip than spin out silicon and helps researchers test out new processors before putting them into production.

“Digital twin technology can help to spark innovation in research, development, and manufacturing of semiconductors across the country — but only if we invest in America’s understanding and ability of this new technology,” Commerce Secretary Gina Raimondo said in a statement. “This new Manufacturing USA institute will not only help to make America a leader in developing this new technology for the semiconductor industry, it will also help train the next generation of American workers and researchers to use digital twins for future advances in R&D and production of chips.”

Congress passed the CHIPS Act in 2022, and President Biden signed it into law in an effort to boost semiconductor manufacturing in the United States, which has a very meager share of semiconductor manufacturing. Much of semiconductor manufacturing is in Taiwan or South Korea.

The Commerce Department has provided almost $33 billion in preliminary grants to chipmakers, in many cases to giant companies like Intel and Micron. Intel announced plans to build massive fabrication plants in Ohio, but they have since been delayed for a year due to economic conditions.

Biden administration officials have scheduled briefings on May 16 where interested parties can speak with the government officials about the funding opportunities. The government will fund the operational activities of the institute, research around digital twins, physical and digital facilities, and workforce training.

The CHIPS Manufacturing USA institute is expected to use integrated physical and digital assets to tackle important semiconductor-industry manufacturing challenges. The institute hopes to foster a collaborative environment to significantly expand innovation, bring benefits to both large and small to mid-sized manufacturers.

CPUs and Processors, Data Center
]]>
https://www.networkworld.com/article/2100411/chips-act-to-fund-285-million-for-semiconductor-digital-twins.html 2100411
Microsoft’s AI ambitions fuel $3.3 billion bet on Wisconsin data center Fri, 10 May 2024 09:35:55 +0000

Microsoft is betting big on AI, investing $3.3 billion in a new AI data center in Wisconsin as part of a growing wave of investment in the technology.

US President Joe Biden visited the site in Mount Pleasant, Racine County, on Wednesday to announce the news. The data center is set to come online by 2026. As part of the project, Microsoft said it is co-funding a new solar energy project that will generate 250MW of power.

“The announcement reflects an investment in AI’s broader potential to transform businesses and manufacturing,” said University of Pennsylvania engineering professor Benjamin C. Lee. “Such transformation needs more than just data centers. It needs people who are skilled in operating those data centers and people who are skilled in connecting AI capabilities to unique challenges and opportunities in existing businesses and communities. Much of this investment focuses on the people.”

Expanding cloud and AI infrastructure

Microsoft broke ground on the facility in September 2022 and said at the time that the project would cost $1 billion. However, the company now plans to invest $3.3 billion in the site to “expand its national cloud and AI infrastructure capacity.”

It has not released details of the hardware it is installing at the data center but said it will “help enable companies in Wisconsin and across the country to develop, deploy and use the world’s most advanced cloud services and AI applications to grow, modernize and improve their products and enterprises.”

A Microsoft spokesperson said the company could not comment further on the announcement.

Microsoft said in a news release that the project will create 2,300 construction jobs. The company is also partnering with Wisconsin’s Gateway Technical College to build a data center academy, which it said will “train and certify more than 1,000 students in five years to work in the new data center and IT sector jobs created in the area.”

To offset the site’s power consumption, Microsoft said it is working with National Grid to co-fund a 250MW solar energy project in Wisconsin. This part of the project is expected to be up and running by 2027.

“Any user or developer looking to build large-scale data centers needed to accommodate AI is scouring the country for extensive power infrastructure that can accommodate up to — and sometimes more than — a gigawatt of power,” said Andy Cvengros, managing director and US data center markets co-lead at JLL. Wisconsin has substantial power infrastructure due to previous investments made within the state.

The Mount Pleasant site was initially earmarked for a manufacturing plant operated by electronics giant Foxconn. However, a planned $10 billion investment, announced in 2017 and championed by former President Donald Trump, never fully materialized. Foxconn did open a data center on the site in 2021, but plans to manufacture LCD screens there were shelved, and now Microsoft is building on the land instead.

Unlocking Advanced AI Applications

Microsoft’s new investments will greatly improve AI applications, moving from systems that simply find and show existing information to ones that can create new content, said Andrés Diana, chief innovation officer for Accrete AI. He said that the additional capacity will enable more sophisticated cloud services, machine learning models, and real-time AI analytics.

“Specific technologies that could be developed include more advanced generative AI applications that could transform content creation, programming, design, and other creative fields,” he added. “Perhaps the most exciting area additional capacity will unlock is the unfettered use and insights that can stem from AI agents running 24/7 conducting research, synthesizing data, and producing predictive insights, recommendations, and work-product at a rate and quality that we cannot fathom today.”

Hyperscalers are excited about emerging capabilities, but there is significant uncertainty about their computational demands, Lee said.

“Data center capacity is growing to support training for larger models and datasets and to support serving these models as businesses and users discover new applications for these models,” he added. “Investments in capacity reflect the belief that more and more people and organizations will use larger and larger models.”

Skill gaps and access to economical labor are vital to scaling AI capabilities, said Jason Carolan, chief innovation officer at data center provider Flexential.

“With their proximity to Chicago, hubs like the greater Milwaukee area and Madison make this a logical choice,” he added. “This also leverages the history of Wisconsin and Illinois in innovation — such as Cray computing in Northern Wisconsin and Minnesota, and Mosaic at Urbana-Champaign.”

The state-of-the-art data center campus Microsoft is building will provide the computing power and high-speed connectivity required to develop and train cutting-edge AI systems, said Bill Long, chief product officer of Zayo, a global telecom infrastructure company.

“Microsoft’s new Co-Innovation Lab partnering with businesses will also enable companies to directly tap its AI expertise to design custom AI solutions to enhance their products and operations,” he added. “Other tech companies need to follow in Microsoft’s footsteps — these huge giants are already planning for the future, and other organizations need to as well to ride the AI demand wave.”

Adnan Masood, UST’s chief AI architect said hyperscalers are moving fast because analysts project the AI infrastructure market will take off in the coming years. “Companies that can provide the best AI services will dominate the cloud market and shape the course of technological progress,” he added. “It goes beyond raw capacity. Cloud providers are also racing to make their operations sustainable. Microsoft’s partnership with National Grid on solar energy and its focus on water conservation reflect this. Going green is a necessity in the age of climate change.”

Data Center
]]>
https://www.networkworld.com/article/2099921/microsofts-ai-ambitions-fuel-3-3-billion-bet-on-wisconsin-data-center.html 2099921
Red Hat unveils image mode for its Linux distro Thu, 09 May 2024 13:34:30 +0000

At the Red Hat Summit this week, the company unveiled a new container image deployment method for Red Hat Enterprise Linux. The new option is designed to streamline operations, enhance consistency across hybrid cloud environments, and accelerate the adoption of cutting-edge technologies like AI and machine learning.

Typically, containers trim down operating systems as much as possible because they run within a host OS, says Bradley Shimmin, chief analyst for AI platforms, analytics, and data management at Omdia. Or Linux is run within virtual machines, also running on top of an underlying operating system. But this creates management complexity.

“Red Hat is using the Open Container image standard to create bootable images, which look and work just like it were the actual OS running on bare metal,” says Shimmin.

Now enterprises can use all the tools that they already have in place for managing containers. “Containerization is the current paradigm that everyone accepts and values, frankly, for deploying software for any kind and any sort,” Shimmin says.

(Related news from Red Hat Summit: Red Hat extends Lightspeed genAI tool to OpenShift and introduces ‘policy as code’ for Ansible)

The idea of putting Linux in a container isn’t new. There are community projects that deliver bootable containers, including Bluefin and Fedora. Even Red Hat’s Linux has been available as an image previously, such as the Red Hat Universal Base Image. In addition, Red Hat also has a container-based operating system for Red Hat OpenShift in Red Hat Enterprise Linux CoreOS.

“But image mode for RHEL is one of, if not the, first enterprise Linux platform to offer it,” says Ben Breard, senior principal product manager at the Red Hat Enterprise Linux business unit.

“The entire operating system will be delivered as a bootable container,” Breard says. “Universal Base Image still needed to be run on a host operating system, for example – it was a container on a host.”

The advantage to doing this is that it can help enterprises streamline operations and management, maintain a consistent and reliable infrastructure whether on bare metal, on virtual machines, or in public clouds.

Companies rarely use an out-of-the-box operating system, Breard says. Instead, they build a standard operating environment by layering in hundreds, or thousands, of additional packages to meet their specific needs. “Problems arise when patches, updates and upgrades have to be pushed out,” he says. “Making changes to the underlying image can be incredibly tedious, time consuming and complex.”

Old-school monolithic applications had the same problem, he says. “But along came containers, which enabled discrete pieces of the app to be packaged and updated individually. So, applying this type of methodology to gold images would be a huge timesaver and innovation driver for enterprise IT. This is the initial problem that image mode solves.”

Being able to make these changes quickly is even more crucial for artificial intelligence, he says. “Patches and upgrades need to be pushed – and work – immediately,” he says. “If you can’t go fast, you can’t reap the benefits of AI workloads.”

One new capability that image mode makes possible, which will accelerate things even further, is that operations teams will now be able to use the same container tooling and workflows as developers, he says.

“This means that changes can be pushed to standard operating environments much faster, and it enables technology organizations to standardize on tooling across both the operations and developer teams,” he says. “And it makes it even easier to push updates out to customers with vast IT estates – a single patch or driver update can be pushed from a single console via a container, and everything is installed and updated via container magic.”

Users will also be able to view and update image mode deployments directly from Red Hat Insights. That will give companies a smarter approach to risk management, says Gunnar Hellekson, vice president and general manager of the Red Hat Enterprise Linux business unit.

“This is the proactive advice feature of the RHEL subscription,” Hellekson says. “We’re able to tell people ahead of time what their CVE exposure might be when they put together a certain kind of image. This is part of an overall move that we have to improve the amount of intelligent coaching and help that we can offer customers during the build and construction of RHEL, as opposed to after RHEL is deployed.”

Additional security benefits come from the fact that security teams will now be able to apply container security tools such as scanning, validation, cryptography and attestation to the base elements of the operating system.

Linux, Networking
]]>
https://www.networkworld.com/article/2099690/red-hat-unveils-image-mode-for-its-linux-distro.html 2099690
Insecure protocols leave networks vulnerable: report Thu, 09 May 2024 13:09:13 +0000

Enterprise IT managers prove to be too trusting of internal network protocols, as many organizations do not encrypt their WAN traffic, according to a new security threat report.

Secure Access Service Edge (SASE) provider Cato Networks this week released the results of its Cato CTRL SASE Threat Report for Q1 2024 at the RSA Conference in San Francisco. The report summarizes findings gathered from Cato traffic flows across more than 2,200 customers during the first quarter, adding up to 1.26 trillion network flows analyzed.

According to the report, many enterprises continue to run unsecured protocols across their WANs, which means when a bad actor penetrates the networks, they have fewer obstacles preventing them from seeing and compromising critical data in transit across the network.

“As threat actors constantly introduce new tools, techniques, and procedures targeting organizations across all industries, cyber threat intelligence remains fragmented and isolated to point solutions,” said Etay Maor, chief security strategist at Cato Networks and founding member of Cato CTRL, in a statement. “Cato CTRL is filling the gap to provide a holistic view of enterprise threats. As the global network, Cato has granular data on every traffic flow from every endpoint communication across the Cato SASE Cloud Platform.”

Hackers exploit internal network protocols

Unencrypted data traversing internal networks using certain network protocols isn’t necessarily secure because it resides within the network perimeter. Bad actors can leverage less secure protocols to scan environments and identify vulnerabilities to exploit.

For instance, Cato’s analysis found that 62% of environments run HTTP, a non-encrypted protocol. In addition, the report also shows that while the Secure Shell (SSH) Protocol is the most secure for accessing remote services, 54% run Telnet inside their organizations. Telnet connections are not encrypted and leave data unprotected.

Nearly half (46%) use Server Message Block (SMB) v1 or v2. The SMB protocol used for file sharing and other purposes has been updated in SMB v3 to protect against vulnerabilities. Still, Cato found that many organizations continue to rely on SMB v1 and SMB v2 despite known vulnerabilities such as EternalBlue and denial of service (DoS) attacks. SMB v3 also enforces the robust AES-128-GCM encryption standard, according to the report.

“The HTTP traffic analysis clearly shows that many organizations do not encrypt their WAN traffic,” the report states. “This means that if an adversary is already inside the organization’s network, they can eavesdrop on unencrypted communications that may include personally identifiable information (PII) or sensitive information such as credentials.” Access to such data could help bad actors with lateral movement, which involves methods to explore and find vulnerabilities within already penetrated networks. The lateral movement across network devices and applications can go undetected until hackers reach their ultimate target.

“To stop cyberattacks, enterprises should be using house machine learning modules based on company data and threat intelligence feeds. They also need to be careful of compromised systems within their organizations. Threat actors are leveraging them to scan (mainly SMB scanning) the network for vulnerabilities,” the report states.

Separately, Cato’s traffic analysis report uncovered the most frequently spoofed shopping sites, which are often used in phishing and spoofing attempts so hackers can get access to personal information.

These cybersquatting efforts, also known as domain squatting, use a domain name to capitalize on the reputation and recognition of a brand that belongs to someone else. By incorporating common typos or slight word differences into domain names, bad actors can pose as legitimate sites and gain access to users who mistakenly entered the typo.

According to the report, Booking, Amazon, and eBay are the top three well-known brands involved in spoofing attempts. Other commonly spoofed brands include Pinterest, Google, Apple, Netflix, Microsoft, Instagram, and YouTube.

Network Security, Networking, SASE
]]>
https://www.networkworld.com/article/2098984/insecure-protocols-leave-networks-vulnerable-report.html 2098984
What is a digital twin and why is it important to IoT? Thu, 09 May 2024 10:00:00 +0000

The use of digital twins – digital representations that mimic the structure and behavior of physical objects or systems – is on the rise. Digital twin technology has moved beyond manufacturing, where it got its start, and into many other industries, driven by advances in sensor technologies, artificial intelligence and data analytics.

In the world of IT, enterprises can use digital twins to replicate their IT environments, including infrastructure, network equipment, and Internet of Things (IoT) devices, and then run simulations or what-if scenarios to test the impact of changes and to optimize performance. They can be used to validate the current state of a network, for example, and test configuration changes, firmware updates, or adjustments to security policies.

What is a digital twin?

A digital twin is a digital representation of a physical object or system. In essence, a digital twin is a computer program that takes real-world data about a physical object or system as inputs and produces as outputs predictions or simulations of how that physical object or system will be affected by those inputs.

The digital twin concept first arose at NASA: full-scale mockups of early space capsules, used on the ground to mirror and diagnose problems in orbit, eventually gave way to fully digital simulations.

The technology behind digital twins has expanded to include buildings, factories and even cities, and some have argued that even people and processes can have digital twins, expanding the concept even further.

The term really took off after Gartner named digital twins as one of its top 10 strategic technology trends for 2017, saying that within three to five years, “billions of things will be represented by digital twins, a dynamic software model of a physical thing or system.” 

Today, digital twin technologies continue to gain traction because of their potential to bridge the gap between physical and virtual worlds, according to Grand View Research, which says the global digital-twin market is forecast to expand at a compound annual growth rate (CAGR) of 38% from 2023 to 2030. Incorporating technologies such as artificial intelligence (AI), cloud computing and IoT into digital twin systems is expected to boost market growth in the forecast period, Grand View says.

How does a digital twin work?

A digital twin begins its life being built by specialists, often experts in data science or applied mathematics. These developers research the physics that underlies the physical object or system being mimicked and use that data to develop a mathematical model that simulates the real-world original in digital space.

The twin is constructed so that it can receive input from sensors gathering data from a real-world counterpart. This allows the twin to simulate the physical object in real time, in the process offering insights into performance and potential problems. The twin could also be designed based on a prototype of its physical counterpart, in which case the twin can provide feedback as the product is refined; a twin could even serve as a prototype itself before any physical version is built.

Digital twin vs. simulation

The terms simulation and digital twin are often used interchangeably, but they are different things. A simulation is designed with a CAD system or similar platform, and can be put through its simulated paces, but may not have a one-to-one analog with a real physical object. A digital twin, by contrast, is built out of input from IoT sensors on real equipment, which means it replicates a real-world system and changes with that system over time. Simulations tend to be used during the design phase of a product’s lifecycle, trying to forecast how a future product will work, whereas a digital twin provides all parts of the business insight into how some product or system they’re already using is working now.

Digital twin use cases

Potential use cases for digital twins are expansive. Objects such as aircraft engines, trains, offshore oil platforms, and turbines can be designed and tested digitally before being physically produced. These digital twins could also be used to help with maintenance operations. For example, technicians could use a digital twin to test that a proposed fix for a piece of equipment works before applying the fix.

Manufacturing is the area where rollouts of digital twins are probably the furthest along, with factories already using digital twins to simulate their processes. Automotive digital twins are made possible because cars are already fitted with telemetry sensors, but refining the technology will become more important as more autonomous vehicles hit the road. Healthcare is the sector that could produce digital twins of people; tiny sensors could send health information back to a digital twin used to monitor and predict a patient’s well-being.

What kind of value can digital twins bring to an organization?

Just as digital twins serve different purposes in different industries, the value of digital twins differs depending on the application.

In the world of manufacturing, for example, a digital twin can enable product designers to try out prototypes before settling on a final design. It’s a way to use digital resources to develop and refine products instead of tapping physical engineering resources. With a digital replica of a product that simulates the real thing in a virtual space, designers can rapidly generate new iterations, optimize their product designs, and improve product quality along the way.

In the semiconductor industry, digital twins can exist in the cloud and replace physical research models. In May 2024, the Biden Administration announced plans to fund up to $285 million to create a CHIPS Manufacturing USA institute focused on digital twins for the semiconductor industry: Digital twin-based research can leverage AI “to help accelerate the design of new U.S. chip development and manufacturing concepts and significantly reduce costs by improving capacity planning, production optimization, facility upgrades, and real-time process adjustments,” the US Department of Commerce said in the announcement.

Digital twins have had appeal in certain industries – manufacturing, oil and gas, utilities, mining – “basically physical, high-capital, asset-intensive verticals,” said Jonathan Lang, research director, worldwide IT/OT convergence strategies, at research firm IDC, in an interview with Network World.

In these settings, the rationale for digital twins has been clear, thanks to potential benefits that include better visibility into the health of assets, improved reliability, cost savings, and the ability to ensure stable operations, Lang says. “IT environments such as infrastructure, network equipment, connected devices, etc., have the same value drivers,” he says.

What kinds of digital twins are there?

IBM offers a categorization scheme based not on specific industries but on the complexity of what’s being twinned. This provides a useful way to think about the needs in specific use cases and gives a look at the broad spectrum of what digital twins can do:

  • Component or part twins simulate the smallest example of a functioning component.
  • Asset twins simulate two or more components working together and let you study the interactions between them.
  • System or unit twins let you see how multiple systems assets work together, simulating an entire production line, for instance.
  • Process twins take the absolute top-level view of systems working together, letting you figure out how an entire factory might operate.

It’s worth noting that adding more components to the mix adds complexity. In particular, mixing and matching components from different manufacturers can be difficult because you’d need everyone’s intellectual property to play nice together within the world of your digital twin.

Advantages and benefits of digital twins

In the world of IT, digital twins that simulate IT infrastructure can:

  • Strengthen security: “Network digital twins offer noteworthy security benefits, including critical vulnerability identification and prioritized remediation plans specific to individual device configurations and features in use,” said Chiara Regale, senior vice president, product and user experience at Forward Networks, in an interview with Network World about reasons to consider a network digital twin.
  • Improve documentation: “Every enterprise is terrible at documentation, due to priorities around delivery, lack of standards on how to record infrastructure changes, and sprawl,” Michael Wynston, director of network architecture and automation at financial services firm Fiserv, told Network World. Digital twin technology can provide insights into the infrastructure beyond just configurations, including what the environment is doing at any given time. This is essential for successful documentation.
  • Boost efficiency: Digital twins enable simulation of data across multiple business systems. IDC research has shown that IT organizations are losing lots of time searching for necessary information to perform a job function. “By unifying the data in a single interface, as well as performing analysis across multiple data sets, digital twins improve worker efficiency and the quality and accuracy of analytical outputs,” said IDC analyst Jonathan Lang.
  • Create a better digital experience: A company can create a digital twin to help enhance digital experience, which is the sum of a user’s digital-based interactions with a product, service, device, etc. “Digital experience twins are a new concept that virtualizes an end user, application, or IoT device to validate the network experience and predict problems before they impact user experience,” says Bob Friday, chief AI officer at Juniper Networks.

How can a digital twin affect an organization’s environmental sustainability?

The proportion of companies implementing a data center infrastructure sustainability program will rise from about 5% in 2022 all the way to 75% by 2027, as sustainability becomes an increasingly central consideration for cost optimization and risk management, according to data from Gartner Research. Industry watchers have made the case that a digital twin can aid in companies’ sustainability efforts. Some potential tie-ins between digital twins and sustainability in the IT arena include:

  • Improved systems management, which can lead to lower downtime and more advanced power-management capabilities.
  • Better inventory and asset-management practices, which allow IT to maximize enterprise server and storage deployments and identify idle capacity.
  • Easier identification of older, energy-hogging gear, which can be replaced with newer generation hardware with better energy performance.

Dr. Mano Rao, IT director for global manufacturing at General Motors, wrote this in a blog about GM’s work with GE Digital to pursue model-based system engineering: “At GM, we developed a Virtual Factory Testbed to provide the tools and environment needed to test all manufacturing process variations that are necessary to support build-to-order manufacturing, as well as all permutations of outcomes that can result from each operation. We employ a process digital twin to mimic plant-floor behavior and test the integration of OT and IT systems—without requiring the physical lines to be deployed, and without requiring physical products flowing down the line. Not only does this help GM’s competitive advantage, but it brings us closer to our sustainability commitments.”

Applications of digital twin in various industries

Digital twins can be deployed in many industries. Here are some examples:

  • Enterprise IT:  A digital twin can replicate an IT environment, including infrastructure, network equipment, and IoT devices. IT teams can use the digital twin to test configuration changes or adjust security policies, for example.
  • Semiconductor industry: Digital twins can enable more collaborative design among engineers and researchers, speeding the exchange of ideas and reducing the cost of research and development.
  • Manufacturing operations: In the manufacturing industry, a digital twin can streamline design processes, improve collaboration among designers, and help to reduce the material used in a product’s design.
  • Healthcare services: In the world of healthcare, a digital twin can improve health monitoring and diagnostic capabilities.
  • Automotive industry: Automotive designers and manufacturers use digital twins to shorten time to market, improve safety procedures, monitor product performance, and identify potential maintenance issues.
  • Power-generation and utilities: Energy and utility companies can creating a digital twin, or a virtual model, of a power plant or distribution network and use it to streamline operations and identify opportunities to improve performance.
  • Urban planning: For urban planning and infrastructure projects, a digital twin can allow city planners to run simulations of new designs – trying out scenarios that could impact traffic congestion or air pollution, for example.

Digital twins and IoT

The explosion of IoT sensors is part of what makes digital twins possible. And as IoT devices are refined, digital-twin scenarios can include smaller and less complex objects, giving additional benefits to companies.

Digital twins can be used to predict different outcomes based on variable data. With additional software and data analytics, digital twins can often optimize an IoT deployment for maximum efficiency, as well as help designers figure out where things should go or how they operate before they are physically deployed.

Digital twin vendors

Building a digital twin is complex, and there is as yet no standardized platform for doing so. The Digital Twin Consortium is a global ecosystem of users who are driving best practices for digital twin usage and defining requirements for new digital twin standards.

One group that’s working to increase awareness, adoption, interoperability, and development of digital twin technology is the Digital Twin Consortium. It’s a global ecosystem of users – including industry, academia, and government members – who are driving best practices for digital twin usage and defining requirements for new digital twin standards.

In contrast with many emerging technologies that are driven by startups, commercial digital-twin offerings are coming from some of the largest companies in the field. For instance, GE, which developed digital-twin technology internally as part of its jet-engine manufacturing process, is now offering its expertise to customers, as is Siemens, another industrial giant heavily involved in manufacturing. Not to be outdone by these factory-floor suppliers, IBM is marketing digital twins as part of its IoT push, and Microsoft is offering its own digital-twin platform under the Azure umbrella.

Digital twin news in the enterprise IT world

Although digital twins have been around for some time, it’s still an early adopter technology. But the number of vendors that offer digital twin solution is growing, and recent upgrades to digital twin offerings in the enterprise IT industry include:

  • Forward Networks launched AI Assist, a generative AI feature built into its Forward Enterprise digital twin platform. The addition is designed to give network and security operations professionals comprehensive insights into network performance via natural language prompts. With AI assist, network engineers of varying skill levels can conduct sophisticated network queries, so they can quickly assess network behavior and identify potential issues.
  • Juniper Networks introduced Marvis Minis, an AI-native networking digital experience twin that uses the company’s Mist AI technology to proactively simulate user connections. That way it can instantly validate network configurations and detect problems without users being present. The Minis product simulates end-user, client, device and application traffic to learn the network configuration through unsupervised ML, and to proactively highlight network issues. Data from Minis is continuously fed back into Mist AI, providing an additional source of insight for the best responses.
  • Nokia extended the capabilities of its existing Nokia Network Digital Twin to include all Android devices, the company announced late last year. Coverage and performance data for Wi-Fi, private and public cellular networks can be automatically collected in real time and processed on Nokia’s edge platform to give enterprises a view of how changes in their operations impact network performance.

Challenges of digital twins

Digital twins offer a real-time look at what’s happening with physical assets, which can radically alleviate maintenance burdens. But keep in mind that that Gartner warns that digital twins aren’t always called for, and can unnecessarily increase complexity. “[Digital twins] could be technology overkill for a particular business problem. There are also concerns about cost, security, privacy, and integration.”

Potential challenges in developing and deploying a digital twin can include:

  • Data management: Data ownership can become a problematic point if not addressed, particularly if an organization is partnering with other entities to run its digital twins, said Kayne McGladrey, a senior member of the Institute of Electrical and Electronics Engineers (IEEE), a nonprofit professional association, and field CISO at Hyperproof, in an interview with CSO.
  • Security: If proper cybersecurity controls aren’t put in place, digital twins can expand a company’s attack surface, give threat actors access to previously inaccessible control systems, and expose preexisting vulnerabilities, CSO notes in a recent feature.
  • Supplier collaboration: A digital twin might have multiple engineers and designers collaborating on a model, so it’s important to create and adhere to well-documented practices for constructing and modifying the models.
  • Complexity: A digital twin may only model a single physical asset, but in more complex environments with multiple stakeholders and complicated processes, companies may need to invest in more advanced digital twins, which in turn require advanced skillsets.
  • Privacy: Legal and regulatory issues come into play with digital twins, too, McGladrey told CSO. The primary concerns are around whether the operators of digital twins can ensure that the data being used in their digital twins is handled in ways that meet regulatory requirements around privacy, confidentiality, and even the geography where data can be housed.

Digital twin skills

Interested in becoming a digital twin pro? The skill sets are demanding, and require specialized expertise in machine learning, artificial intelligence, predictive analytics, and other data-science capabilities. That’s part of the reason why big companies are hanging out their shingle: The little guy might find it more reasonable to hire a consultant team than to upskill their in-house workers. 

Other digital twin resources

 

Internet of Things, Network Security, Networking
]]>
https://www.networkworld.com/article/965860/what-is-digital-twin-technology-and-why-it-matters.html 965860
2024 global network outage report and internet health check Thu, 09 May 2024 02:15:23 +0000

The reliability of services delivered by ISPs, cloud providers and conferencing services is critical for enterprise organizations. ThousandEyes, a Cisco company, monitors how providers are handling any performance challenges and provides Network World with a weekly roundup of events that impact service delivery. Read on to see the latest analysis, and stop back next week for another update.

(Note: We have archived prior-year updates, including the 2023 outage report and our coverage during the Covid-19 years, when we began tracking the performance of cloud providers and ISPs.)

Internet report for April 29- May 5, 2024

ThousandEyes reported 151 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of April 29- May 5. That’s down slightly (3%) from 156 outages the week prior. Specific to the U.S., there were 50 outages, which is down 7% from 54 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages increased from 104 to 113 outages, a 9% increase compared to the week prior. In the U.S., the number of ISP outages increased 5% from 37 to 39 outages.

Public cloud network outages: Globally, cloud provider network outages decreased from 22 to 15 outages. In the U.S., cloud provider network outages decreased from six to two.

Collaboration app network outages: Globally, collaboration app network outages increased from eight to nine outages. In the U.S., collaboration app network outages fell from four to three outages.

Two notable outages

On April 29, NTT America, a global Tier 1 ISP and subsidiary of NTT Global, experienced an outage that impacted some of its customers and downstream partners in multiple regions, including the U.S., Japan, South Korea, China, Taiwan, Singapore, the Netherlands, Hungary, Turkey, Brazil, India, Argentina, Australia, the U.K., Thailand, Malaysia, Mexico, and Canada. The outage, lasting 24 minutes, was first observed around 2:40 PM EDT and appeared to initially center on NTT nodes located in San Jose, CA. Around five minutes into the outage, the nodes located in San Jose, CA, appeared to clear, and were replaced by nodes located in Tokyo, Japan, in exhibiting outage conditions. Ten minutes after first being observed, the nodes located in Tokyo, Japan, were joined by nodes located in Osaka, Japan, Singapore, Dallas, TX, and Los Angeles, CA, in exhibiting outage conditions. The outage was cleared around 3:05 PM EDT. Click here for an interactive view.

On April 29, Cogent Communications, a multinational transit provider based in the US, experienced an outage that impacted multiple downstream providers and customers across various regions, including the U.S., Brazil, the U.K., Canada, Chile, Mexico, Japan, Germany, Spain, and France. The outage, lasting for a total of one hour and 12 minutes, was divided into two occurrences over a period of 35 minutes. The first occurrence was observed around 2:45 AM EDT and initially seemed to be centered on Cogent nodes located in Ashburn, VA, and Washington, D.C. Five minutes into the outage, the nodes located in Ashburn, VA, appeared to clear and were replaced by nodes located in Baltimore, MD, New York, NY, and Phoenix, AZ, along with nodes located in Washington, D.C., in exhibiting outage conditions. This increase in nodes exhibiting outage conditions also appeared to coincide with an increase in the number of downstream customers, partners, and regions impacted. Twenty minutes after appearing to clear, nodes located in New York, NY, and Washington, D.C., were joined by nodes located in Houston, TX, in exhibiting outage conditions. The outage was cleared around 3:20 AM EDT. Click here for an interactive view.

Internet report for April 22-28, 2024

ThousandEyes reported 156 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of April 22-28. That’s down 8% from 170 outages the week prior. Specific to the U.S., there were 54 outages, which is down 36% from 85 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages increased from 99 to 104 outages, a 5% increase compared to the week prior. In the U.S., the number of ISP outages decreased 31% from 54 to 37 outages.

Public cloud network outages: Globally, cloud provider network outages remained the same as the week prior, recording 22 outages. In the U.S., cloud provider network outages decreased from 10 to six.

Collaboration app network outages: Globally, collaboration app network outages decreased from nine to eight outages. In the U.S., collaboration app network outages stayed at the same level as the week before: four outages.

Two notable outages

On April 26, Time Warner Cable, a U.S. based ISP, experienced a disruption that impacted a number of customers and partners across the U.S. The outage was distributed across two occurrences over a twenty-five-minute period and was first observed at around 7:45 PM EDT and appeared to center on Time Warner Cable nodes located in New York, NY.  Ten minutes after first being observed, the number of nodes located in New York, NY, exhibiting outage conditions increased. The outage lasted a total of 17 minutes and was cleared at around 8:10 PM EDT. Click here for an interactive view.

On April 24, NTT America, a global Tier 1 ISP and subsidiary of NTT Global, experienced an outage that impacted some of its customers and downstream partners in multiple regions, including the U.S., Germany, India, China, Hong Kong, Canada, and Japan. The outage, lasting 9 minutes, was first observed around 7:15 AM EDT and appeared to initially center on NTT nodes located in San Jose, CA. Around five minutes into the outage, the nodes located in San Jose, CA, were joined by nodes located in Dallas, TX, in exhibiting outage conditions. The outage was cleared around 7:25 AM EDT. Click here for an interactive view.

Internet report for April 15-21, 2024

ThousandEyes reported 170 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of April 15-21. That’s up 11% from 161 outages the week prior. Specific to the U.S., there were 85 outages, which is up 18% from 72 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages decreased from 107 to 99 outages, a 7% decline compared to the week prior. In the U.S., the number of ISP outages climbed 6% from 51 to 54 outages.

Public cloud network outages: Both globally (22) and in the U.S. (10), cloud provider network outages remained the same as the week prior.

Collaboration app network outages: Globally, collaboration app network outages increased from five to nine outages. In the U.S., collaboration app network outages increased from two to four outages.

Two notable outages

On April 20, Cogent Communications, a multinational transit provider based in the US, experienced an outage that impacted multiple downstream providers and its own customers across various regions, including the US, Canada, Portugal, Germany, the Netherlands, Luxembourg, South Africa, Hong Kong, Singapore, the U.K., Italy, France, and Spain. The outage, lasting a total of one hour and 32 minutes, was divided into a series of occurrences over a period of two hours and 28 minutes. The first occurrence was observed around 10:55 PM EDT and initially seemed to be centered on Cogent nodes located in Seattle, WA, Portland, OR, and Hong Kong. Five minutes into the outage, the nodes located in Hong Kong appeared to clear and were replaced by nodes located in Minneapolis, MN, and Cleveland OH, in exhibiting outage conditions. Thirty-five minutes into the first occurrence, the number of nodes exhibiting outage conditions increased to include nodes located in Seattle, WA, Washington, D.C., Minneapolis, MN, Cleveland, OH, Boston, MA, and Bilbao, Spain. This increase in nodes exhibiting outage conditions also appeared to coincide with an increase in the number of downstream customers, partners, and regions impacted. A second occurrence was observed around five minutes after the issue initially appeared to have cleared. This second occurrence lasted approximately fourteen minutes and seemed to initially be centered around nodes located in Cleveland, OH. Around five minutes into the second occurrence, nodes located in Cleveland, OH, appeared to be temporarily replaced by nodes located in Seattle, WA, and Chicago, IL, before they themselves were replaced once again by nodes located in Cleveland, OH.  Around 15 minutes after appearing to clear, a third occurrence was observed, this time appearing to be centered around nodes located in Bilbao, Spain, and Cleveland, OH. The outage was cleared around 1:25 AM EDT. Click here for an interactive view.

On April 17, NTT America, a global Tier 1 ISP and subsidiary of NTT Global, experienced an outage that impacted some of its customers and downstream partners across multiple regions including the U.S., the U.K., the Netherlands, and Germany. The outage, lasting 17 minutes, was first observed around 2:55 AM EDT and appeared to initially center on NTT nodes located in Seattle, WA. Five minutes into the outage nodes located in Seattle, WA, were joined by nodes located in Dallas, TX, in exhibiting outage conditions. The outage was cleared around 3:15 AM EDT. Click here for an interactive view.

Internet report for April 8-14, 2024

ThousandEyes reported 161 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of April 8-14. That’s up 11% from 154 outages the week prior. Specific to the U.S., there were 72 outages, which is up 4% from 69 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages increased from 97 to 107 outages, a 10% increase compared to the week prior. In the U.S., the number of ISP outages climbed 13% from 45 to 51 outages.

Public cloud network outages: Globally, cloud provider network outages increased from 16 to 22 outages. In the U.S., cloud provider network outages decreased from 14 to 10 outages.

Collaboration app network outages: Globally, collaboration app network outages remained at the same level as the week prior, recording 5 outages. In the U.S., collaboration app network outages decreased from three to two outages.

Two notable outages

On April 8, Rackspace Technology, a U.S. managed cloud computing provider headquartered in San Antonio, Texas, experienced an outage that impacted multiple downstream providers, as well as Rackspace customers within multiple regions including the U.S., Japan, Vietnam, Spain, Canada, Germany, Singapore, France, the Netherlands, the U.K., Brazil, and South Africa. The outage, lasting a total of 14 minutes, was first observed around 9:00 AM EDT and appeared to center on Rackspace nodes located in Chicago, IL. Around five minutes into the outage, the number of nodes located in Chicago, IL, exhibiting outage conditions, appeared to clear. This decrease in impacted nodes appeared to coincide with a reduction of impacted regions. The outage was cleared around 9:15 AM EDT. Click here for an interactive view.

On Apr 10, GTT Communications, a Tier 1 ISP headquartered in Tysons, VA, experienced an outage that impacted some of its partners and customers across multiple regions, including the U.S., the U.K., Brazil, and Canada. The outage, lasting 9 minutes, was first observed around 8:10 AM EDT and appeared to initially be centered on GTT nodes located in Los Angeles, CA. Around five minutes into the outage the nodes located in Los Angeles appeared to clear and were replaced by nodes located in New York, NY, in exhibiting outage conditions. This change in location of nodes exhibiting outage conditions appeared to coincide with an increase in the number of impacted regions, downstream partners and customers. The outage was cleared around 8:20 AM EDT. Click here for an interactive view.

Internet report for April 1-7, 2024

ThousandEyes reported 145 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of April 1-7. That’s up 23% from 118 outages the week prior. Specific to the U.S., there were 69 outages, which is up 21% from 57 outages the week prior. Here’s a breakdown by category:

ISP outages: Both globally and in the U.S., ISP outages increased by 45% compared to the week prior. Globally, the number of ISP outages climbed from 67 to 97. In the U.S., the number of ISP outages jumped from 31 to 45 outages.

Public cloud network outages: Globally, cloud provider network outages declined slightly from 17 to 16 outages. In the U.S., they remained at the same level (14) as the previous week.

Collaboration app network outages: Globally, collaboration app network outages fell from 13 to five outages. In the U.S., collaboration app network outages dropped from eight to three outages.

Two notable outages

On April 2, Hurricane Electric, a network transit provider headquartered in Fremont, CA, experienced an outage that impacted customers and downstream partners across multiple regions, including the U.S., Taiwan, Australia, Germany, Japan, the U.K., Ireland, India, and China. The outage, lasting 12 minutes, was first observed around 12:40 PM EDT and initially appeared to center on Hurricane Electric nodes located in New York, NY, Los Angeles, CA, and San Jose, CA. Five minutes into the outage, the nodes exhibiting outage conditions expanded to include nodes located in Chicago, IL, and Ashburn, VA. This coincided with an increase in the number of downstream partners and countries impacted. The outage was cleared around 12:55 PM EDT. Click here for an interactive view.

On April 2, BT, a multinational Tier 1 ISP headquartered in London, U.K., experienced an outage on their European backbone that impacted customers and downstream partners across multiple regions, including the U.S., the U.K., Switzerland, Spain, and Germany. The disruption, lasting 24 minutes, was first observed around 7:20 PM EDT and appeared to center on nodes located in London, England. Click here for an interactive view.

Internet report for March 25-31, 2024

ThousandEyes reported 118 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of March 18-24. That’s down 28% from 164 outages the week prior. Specific to the U.S., there were 57 outages, which is down slightly from 58 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages was nearly cut in half, falling from 128 to 67 outages, a 48% decrease compared to the week prior. In the U.S., the number of ISP outages fell slightly from 43 to 31 outages, a decline of 28%.

Public cloud network outages: Globally, total cloud provider network outages nearly tripled, jumping from six to 17 outages. In the U.S., cloud provider network outages jumped from three to 14 outages

Collaboration app network outages: Globally, collaboration app network outages more than doubled, increasing from six to 13. Similarly, in the U.S., collaboration app network outages doubled from four to eight.

Two notable outages

On March 29, Arelion (formerly known as Telia Carrier), a global Tier 1 ISP headquartered in Stockholm, Sweden, experienced an outage that impacted customers and downstream partners across multiple regions, including the U.S., the Netherlands, and Japan. The disruption, lasting a total of 8 minutes, was first observed around 5:26 AM EDT and appeared to center on nodes located in Phoenix, AZ. The outage was cleared around 5:35 AM EDT. Click here for an interactive view.

On March 29, Cogent Communications, a U.S. based multinational transit provider, experienced an outage that impacted multiple downstream providers as well as Cogent customers across multiple regions, including the U.S., Canada, Germany, and Japan. The outage, lasting 9 minutes, was first observed around 1:45 AM EDT and appeared to initially center on Cogent nodes located in San Francisco, CA, Salt Lake City, UT, and Seattle, WA. Five minutes after first being observed, the nodes located in San Francisco, CA, Salt Lake City, UT and Seattle, WA, appeared to recover and were replaced by nodes located in Kansas City, MO in exhibiting outage conditions. As a result, the number of customers and providers impacted was reduced. The outage was cleared around 1:55 AM EDT. Click here for an interactive view.

Internet report for March 18-24, 2024

After a spike the week before, global outages decreased last week. ThousandEyes reported 164 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of March 18-24. That’s down 20% from 206 outages the week prior. Specific to the U.S., there were 58 outages, which is down 33% from 87 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages fell slightly from 131 to 128 outages, a 2% decrease compared to the week prior. In the U.S., the number of ISP outages fell slightly from 46 to 43 outages.

Public cloud network outages: Globally, cloud provider network outages decreased from 10 to six outages. In the U.S., they decreased from six to three outages.

Collaboration app network outages: Globally, collaboration app network outages fell dramatically from 34 to six outages. In the U.S., collaboration app network outages dropped from 28 to four outages.

Two notable outages

On March 20, Cogent Communications, a multinational transit provider based in the US, experienced an outage that impacted multiple downstream providers and its own customers across various regions, including the US, Italy, Saudi Arabia, France, Germany, Canada, Hong Kong, Luxembourg, Chile, Brazil, Kenya, Singapore, Mexico, Switzerland, Spain, Australia, Finland, Japan, Ireland, and Norway. The outage occurred for a total of 24 minutes, divided into three occurrences over a period of one hour and fifteen minutes. The first occurrence of the outage was observed around 12:50 AM EDT and initially seemed to be centered on Cogent nodes located in Frankfurt, Munich, and Hamburg in Germany, in Paris, France, and Kyiv, Ukraine. A second occurrence was observed around fifteen minutes after the issue initially appeared to have cleared. This second occurrence lasted approximately eighteen minutes and seemed to be centered around nodes located in Frankfurt, Munich, and Hamburg, Germany. Around ten minutes into the second occurrence, nodes located in Frankfurt, Munich and Hamburg, Germany, were joined by nodes located in Nuremberg, Germany, San Francisco, CA, San Jose, CA, Zurich, Switzerland, Amsterdam, the Netherlands, and Paris, France in exhibiting outage conditions. Around 30 minutes after appearing to clear, a third occurrence was observed, this time appearing to be centered around nodes located in Toronto, Canada. The outage was cleared around 2:05 AM EDT. Click here for an interactive view.

On March 24, Hurricane Electric, a network transit provider headquartered in Fremont, CA, experienced an outage that impacted customers and downstream partners across multiple regions, including the U.S., China, Australia, Germany, the U.K., and Japan. The outage, first observed around 1:10 PM EDT, lasted 7 minutes in total and appeared to center on Hurricane Electric nodes located in New York, NY, and San Jose, CA. The outage was cleared at around 1:20 PM EDT. Click here for an interactive view.

Internet report for March 11-17, 2024

After weeks of decreasing, global outages increased significantly last week. ThousandEyes reported 206 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of March 11-17. That’s up 45% from 142 outages the week prior. Specific to the U.S., there were 87 outages, which is up 38% from 63 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages increased from 91 to 131 outages, a 44% increase compared to the week prior. In the U.S., the number of ISP outages climbed slightly from 44 to 46 outages.

Public cloud network outages: Globally, cloud provider network outages increased from six to 10 outages. In the U.S., they increased from four to six outages.

Collaboration app network outages: Globally, collaboration app network outages spiked from six to 34 outages. In the U.S., collaboration app network outages jumped from 3 to 28 outages.

Two notable outages

On March 16, Cogent Communications, a U.S. based multinational transit provider, experienced an outage that impacted multiple downstream providers as well as Cogent customers across multiple regions, including the U.S., Ireland, the U.K., Sweden, Austria, Germany, and Italy. The outage, lasting a total of 12 minutes, was divided into two occurrences over a one-hour and ten-minute period. The first occurrence was observed at around 6:30 PM EDT and appeared to initially be centered on Cogent nodes located in Baltimore, MD and New York, NY. Five minutes into the first occurrence, the nodes located in New York, NY, were replaced by nodes located in Philadelphia, PA, in exhibiting outage conditions. One hour after the issue initially appeared to have cleared, a second occurrence was observed. This second occurrence lasted approximately four minutes and appeared to be centered around nodes located in Baltimore, MD, Philadelphia, PA, New York, NY, and Newark, NJ. The outage was cleared around 7:45 PM EDT. Click here for an interactive view.

On March 12, Hurricane Electric, a network transit provider headquartered in Fremont, CA, experienced an outage that impacted customers and downstream partners across the U.S. and Canada. The outage, first observed around 2:00 AM EDT, lasted 7 minutes in total and was divided into two occurrences over a thirty-minute period. The first occurrence appeared to initially center on Hurricane Electric nodes located in Chicago, IL. Twenty minutes after appearing to clear, the nodes located in Chicago, IL, were joined by nodes located in Seattle, WA in exhibiting outage conditions. This increase in impacted nodes appeared to coincide with an increase in the number of impacted downstream customers and partners. The outage was cleared at around 2:30 AM EDT. Click here for an interactive view.

Additional details from ThousandEyes are available here.

Internet report for March 4-10, 2024

ThousandEyes reported 142 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of March 4-10. That’s down 8% from 155 outages the week prior. Specific to the U.S., there were 63 outages, which is down 10% from 70 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages decreased from 95 to 91 outages, a 4% decrease compared to the week prior. In the U.S., the number of ISP outages stayed the same at 44 outages.

Public cloud network outages: Globally, cloud provider network outages fell from 13 to six outages. In the U.S., they decreased from seven to four outages.

Collaboration app network outages: Globally, collaboration app network outages decreased from eight outages to six. In the U.S., collaboration app network outages stayed at the same level as the week before: three outages.

Three notable outages

On March 5, several Meta services, including Facebook and Instagram, experienced a disruption that impacted users attempting to login, preventing them from accessing those applications. The disruption was first observed around 10:00 AM EST. During the disruption, Meta’s web servers remained reachable, with network paths to Meta services showing no significant error conditions, suggesting that a backend service, such as authentication, was the cause of the issue. The service was fully restored around 11:40 AM EST. More detailed analysis here.

On March 5, Comcast Communications experienced an outage that impacted a number of downstream partners and customers as well as the reachability of many applications and services, including Webex, Salesforce, and AWS. The outage, lasting 1 hour and 48 minutes, was first observed around 2:45 PM EST and appeared to impact traffic as it traversed Comcast’s network backbone in Texas, with Comcast nodes located in Dallas, TX and Houston TX, exhibiting outage conditions. The outage was completely cleared around 4:40 PM EST. More detailed analysis here.

On March 6, LinkedIn experienced a service disruption that impacted its mobile and desktop global user base. The disruption was first observed around 3:45 PM EST, with users experiencing service unavailable error messages. The major portion of the disruption lasted around one hour, during which time no network issues were observed connecting to LinkedIn web servers, further indicating the issue was application related. At around 4:38 PM EST, the service started to recover and was totally clear for all users around 4:50 PM EST. More detailed analysis here.

Additional details from ThousandEyes are available here.

Internet report for February 26-March 3, 2024

ThousandEyes reported 155 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of February 26-March 3. That’s down 6% from 165 outages the week prior. Specific to the U.S., there were 70 outages, which is up 19% from 59 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages decreased from 111 to 95 outages, a 14% decrease compared to the week prior. In the U.S., ISP outages increased 10%, climbing from 40 to 44 outages.

Public cloud network outages: After weeks of decreasing, cloud provider network outages began increasing again last week. Globally, cloud provider network outages climbed from eight to 13 outages. In the U.S., they increased from four to seven outages.

Collaboration app network outages: Globally, collaboration app network outages increased from five outages to eight. In the U.S., collaboration app network outages rose from two to three outages.

Two notable outages

On February 27, Level 3 Communications, a U.S. based Tier 1 carrier acquired by Lumen, experienced an outage that impacted multiple downstream partners and customers across the U.S. The outage, lasting a total of 18 minutes over a twenty-five-minute period, was first observed around 2:25 AM EST and appeared to be centered on Level 3 nodes located in Cleveland, OH. The outage was cleared around 2:50 AM EST. Click here for an interactive view.

On February 28, Time Warner Cable, a U.S. based ISP, experienced a disruption that impacted a number of customers and partners across the U.S. The outage was first observed at around 2:00 PM EST and appeared to center on Time Warner Cable nodes located in New York, NY.  Five minutes into the outage, the number of nodes located in New York, NY, exhibiting outage conditions increased. The outage lasted 14 minutes and was cleared at around 2:15 PM EST. Click here for an interactive view.

Additional details from ThousandEyes are available here.

Internet report for February 19-25, 2024

ThousandEyes reported 165 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of February 19-25. That’s down significantly from 243 outages in the week prior – a decrease of 32%. Specific to the U.S., there were 59 outages, which is down 34% from 90 outages the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages decreased from 121 to 111 outages, an 8% decrease compared to the week prior. In the U.S., ISP outages decreased from 48 to 40 outages, a 17% decrease compared to the previous week.

Public cloud network outages: Globally, cloud provider network outages decreased significantly from 42 to eight outages, a 81% decrease compared to the week prior. In the U.S., they fell from eight to four outages.

Collaboration app network outages: Globally, collaboration app network outages decreased from seven outages to five. In the U.S., collaboration app network outages remained at the same level as the week prior: two outages.

Two notable outages

On February 22, Hurricane Electric, a network transit provider headquartered in Fremont, CA, experienced an outage that impacted customers and downstream partners across multiple regions, including the U.S., Australia, China, the U.K., Japan, Singapore, India, France, and Canada. The outage, first observed around 9:10 AM EST, lasted 32 minutes in total and was divided into two occurrences over a forty-five-minute period. The first occurrence appeared to initially center on Hurricane Electric nodes located in New York, NY, Phoenix, AZ and Indianapolis, IN. Ten minutes after appearing to clear, the nodes located in New York, NY, were joined by nodes located in San Jose, CA in exhibiting outage conditions. Five minutes into the second occurrence, the disruption appeared to radiate out, and the nodes located in New York, NY, Phoenix, AZ and Indianapolis, IN, were joined by nodes located in Seattle, WA, Denver, CO, Ashburn, VA, Kansas City, MO and Omaha, NE in exhibiting outage conditions. This increase in impacted nodes appeared to coincide with an increase in the number of impacted downstream customers and partners. The outage was cleared at around 9:55 AM EST. Click here for an interactive view.

On February 21, Time Warner Cable, a U.S. based ISP, experienced a disruption that impacted a number of customers and partners across the U.S. The outage was first observed at around 2:45 PM EST and appeared to center on Time Warner Cable nodes located in New York, NY.  Fifteen minutes into the outage, the number of nodes located in New York, NY, exhibiting outage conditions increased. The outage lasted 23 minutes and was cleared at around 3:10 PM EST. Click here for an interactive view.

Additional details from ThousandEyes are available here.

Internet report for February 12-18, 2024

ThousandEyes reported 243 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of February 12-18. That’s down from 319 outages in the week prior – a decrease of 24%. Specific to the U.S., there were 90 outages, which is down slightly from 91 the week prior. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages decreased from 134 to 121 outages, a 10% decrease compared to the week prior. In the U.S., ISP outages decreased from 60 to 48 outages, a 20% decrease compared to the previous week.

Public cloud network outages: Globally, cloud provider network outages decreased significantly from 107 to 42 outages, a 61% decrease compared to the week prior. In the U.S., they doubled from four to eight outages.

Collaboration app network outages: Globally, collaboration app network outages decreased from 11 outages to seven. In the U.S., collaboration app network outages decreased from 5 to 2 outages.

Two notable outages

On February 16, Hurricane Electric, a network transit provider headquartered in Fremont, CA, experienced an outage that impacted customers and downstream partners across multiple regions, including the U.S., Egypt, Sweden, the U.K., Japan, Mexico, Australia, Argentina, the Netherlands, Belgium, and Canada. The outage, first observed around 8:25 AM EST, lasted 23 minutes in total and was divided into two occurrences over a thirty-minute period. The first occurrence appeared to initially center on Hurricane Electric nodes located in New York, NY. Fifteen minutes into the first occurrence, the nodes located in New York, NY, were joined by nodes located in Paris, France and Amsterdam, the Netherlands in exhibiting outage conditions.  Five minutes after appearing to clear, nodes located in New York, NY once again began exhibiting outage conditions. The outage was cleared at around 8:55 AM EST. Click here for an interactive view.

On February 17, AT&T experienced an outage on their network that impacted AT&T customers and partners across the U.S. The outage, lasting around 14 minutes, was first observed around 3:40 PM EST, appearing to center on AT&T nodes located in Little Rock, AR. Five minutes after first being observed, the number of nodes exhibiting outage conditions located in Little Rock, AR, appeared to rise. This increase in nodes exhibiting outage conditions appeared to coincide with a rise in the number of impacted partners and customers. The outage was cleared at around 3:55 PM EST. Click here for an interactive view.

Additional details from ThousandEyes are available here.

Internet report for February 5-11, 2024

ThousandEyes reported 319 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of February 5-11. That’s up from 265 outages in the week prior – an increase of 20%. Specific to the U.S., there were 91 outages. That’s up from 45 outages the week prior, an increase of 102%. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages increased from 106 to 134 outages, a 26% increase compared to the week prior. In the U.S., ISP outages more than doubled from 28 to 60 outages, a 114% increase compared to the previous week.

Public cloud network outages: Globally, cloud provider network outages decreased slightly from 117 to 107, a 9% decrease compared to the week prior. In the U.S., they decreased from five to four outages.

Collaboration app network outages: Globally, collaboration app network outages climbed from three outages to 11. In the U.S., there were five collaboration app network outages, up from zero the week prior.

Two notable outages

On February 7, Time Warner Cable, a U.S. based ISP, experienced a disruption that impacted a number of customers and partners across multiple regions, including the U.S., Ireland, the U.K., Canada, India, Australia, Singapore, Japan, the Netherlands, France, Germany, Indonesia, Hong Kong, South Korea, China, and Brazil. The outage was observed across a series of occurrences over the course of forty-five minutes. First observed at around 4:50 PM EST, the outage, consisting of five equally spaced four-minute periods, appeared to initially center on Time Warner Cable nodes in New York, NY. Five minutes after appearing to clear, nodes located in New York, NY, were again observed exhibiting outage conditions, joined by nodes located in San Jose, CA. By the third period, the nodes located in San Jose, CA, had appeared to clear and were instead replaced by nodes located in Los Angeles, CA, in exhibiting outage conditions, in addition to nodes located in New York, NY. The outage lasted a total of 20 minutes and was cleared at around 5:35 PM EST. Click here for an interactive view.

On February 6, NTT America, a global Tier 1 ISP and subsidiary of NTT Global, experienced an outage that impacted some of its customers and downstream partners in multiple regions, including the U.S., Germany, the U.K., the Netherlands, and Hong Kong The outage, lasting 24 minutes, was first observed around 8:10 PM EST and appeared to initially center on NTT nodes located in Chicago, IL and Dallas, TX. Around five minutes into the outage, the nodes located in Chicago, IL and Dallas, TX, were joined by nodes located in Newark, NJ, in exhibiting outage conditions. The apparent increase of nodes exhibiting outage conditions appeared to coincide with an increase in the number of impacted downstream customers and partners. The outage was cleared around 8:35 PM EST. Click here for an interactive view.

Additional details from ThousandEyes are available here.

Internet report for January 29- February 4, 2024

ThousandEyes reported 265 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of January 29- February 4. That’s more than double the number of outages in the week prior (126). Specific to the U.S., there were 45 outages. That’s down from 55 outages the week prior, a decrease of 18%. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages was 106, an increase of 15% compared to 92 outages the previous week. In the U.S., ISP outages decreased by 28%, dropping from 39 to 28 outages.

Public cloud network outages: Globally, cloud provider network outages skyrocketed from five to 117 last week (the increase appeared to be a result of an increase in outages in the APJC region). In the U.S., they increased from two to five outages.

Collaboration app network outages: Globally, collaboration app network outages decreased from five outages to three. In the U.S., collaboration app network outages decreased from one outage to zero.

Two notable outages

On January 31, Comcast Communications experienced an outage that impacted a number of downstream partners and customers across multiple regions including the U.S., Malaysia, Singapore, Hong Kong, Canada, Germany, South Korea, Japan, and Australia. The outage, lasting 18 minutes, was first observed around 8:00 PM EST and appeared to be centered on Comcast nodes located in Ashburn, VA. Ten minutes into the outage, the nodes exhibiting outage conditions, located in Ashburn, VA, appeared to increase. The apparent increase of nodes exhibiting outage conditions appeared to coincide with an increase in the number of impacted downstream customers and partners. The outage was cleared around 8:20 PM EST. Click here for an interactive view.

On February 2, NTT America, a global Tier 1 ISP and subsidiary of NTT Global, experienced an outage that impacted some of its customers and downstream partners in multiple regions, including the U.S., Germany, the Netherlands, and the U.K. The outage, lasting 23 minutes, was first observed around 1:25 PM EST and appeared to center on NTT nodes located in Dallas, TX and Chicago, IL. The outage was cleared around 1:50 PM EST. Click here for an interactive view.

Additional details from ThousandEyes are available here.

Internet report for January 22-28, 2024

ThousandEyes reported 126 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of January 22-28. That’s down from 156 the week prior, a decrease of 19%. Specific to the U.S., there were 55 outages. That’s down from 91 outages the week prior, a decrease of 40%. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages was 92, a decrease of 14% compared to 107 outages the previous week. In the U.S., ISP outages decreased by 35%, dropping from 60 to 39 outages.

Public cloud network outages: Globally, cloud provider network outages dropped from 14 to five last week. In the U.S., they decreased from seven to two outages.

Collaboration app network outages: Globally, collaboration app network outages remained the same as the week prior: five outages. In the U.S., collaboration app network outages decreased from four outages to one.

Three notable outages

On January 26, Microsoft experienced an issue that affected its customers in various regions around the globe. The outage was first observed around 11:00 AM EST and seemed to cause service failures in Microsoft Teams, which affected the usability of the application for users across the globe. While there was no packet loss when connecting to the Microsoft Teams edge servers, the failures were consistent with reported issues within Microsoft’s network that may have prevented the edge servers from reaching the application components on the backend. The incident was resolved for many users by 6:10 PM EST. Click here for an interactive view.

On January 24, Akamai experienced an outage on its network that impacted content delivery connectivity for customers and partners using Akamai Edge delivery services in the Washington D.C. area. The outage was first observed around 12:10 PM EST and appeared to center on Akamai nodes located in Washington D.C. The outage lasted a total of 24 minutes. Akamai announced that normal operations had resumed at 1:00 PM EST. Click here for an interactive view.

On January 23, Internap, a U.S based cloud service provider, experienced an outage that impacted many of its downstream partners and customers in multiple regions, including the U.S., and Singapore. The outage, which was first observed around 2:30 AM EST, lasted 18 minutes in total and appeared to be centered on Internap nodes located in Boston, MA. The outage was at its peak around fifteen minutes after being observed, with the highest number of impacted regions, partners, and customers. The outage was cleared around 2:55 AM EST. Click here for an interactive view.

Additional details from ThousandEyes are available here.

Internet report for January 15-21, 2024

ThousandEyes reported 156 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of January 15-21. That’s up from 151 the week prior, an increase of 3%. Specific to the U.S., there were 91 outages. That’s up significantly from 63 outages the week prior, an increase of 44%. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages was 107, an increase of 8% compared to 83 outages the previous week, and in the U.S. ISP outages increased by 58%, climbing from 38 to 60 outages.

Public cloud network outages: Globally, cloud provider network outages dropped from 30 to 14 last week. In the U.S., they increased from six to seven outages.

Collaboration app network outages: Globally, collaboration app network outages decreased from seven to five outages. In the U.S., collaboration app network outages stayed at the same level: four outages.

Two notable outages

On January 16, Oracle experienced an outage on its network that impacted Oracle customers and downstream partners interacting with Oracle Cloud services in multiple regions, including the U.S., Canada, China, Panama, Norway, the Netherlands, India, Germany, Malaysia, Sweden, Czech Republic, and Norway. The outage was first observed around 8:45 AM EST and appeared to center on Oracle nodes located in various regions worldwide, including Ashburn, VA, Tokyo, Japan, San Jose, CA, Melbourne, Australia, Cardiff, Wales, London, England, Amsterdam, the Netherlands, Frankfurt, Germany, Slough, England, Phoenix, AZ, San Francisco, CA, Atlanta, GA, Washington D.C., Richmond, VA, Sydney, Australia, New York, NY, Osaka, Japan, and Chicago, IL. Thirty-five minutes after first being observed, all the nodes exhibiting outage conditions appeared to clear. A further ten minutes later, nodes located in Toronto, Canada, Phoenix, AZ, Frankfurt, Germany, Cleveland, OH, Slough, England, Ashburn, VA, Washington, D.C., Cardiff, Wales, Amsterdam, the Netherlands, Montreal, Canada, London, England, Sydney, Australia, and Melbourne, Australia began exhibiting outage conditions again.  The outage lasted 40 minutes in total and was cleared at around 9:50 AM EST. Click here for an interactive view.

On January 20, Hurricane Electric, a network transit provider headquartered in Fremont, CA, experienced an outage that impacted customers and downstream partners across multiple regions, including the U.S., Thailand, Hong Kong, India, Japan, and Australia. The outage, first observed around 7:15 PM EST, lasted 11 minutes in total and was divided into two occurrences over a one-hour five-minute period. The first occurrence appeared to center on Hurricane Electric nodes located in Los Angeles, CA. Fifty minutes after the first occurrence appeared to clear, the second occurrence was observed. Lasting 8 minutes, the outage initially appeared to center on nodes located in Los Angeles, CA. Around five minutes into the second occurrence, the nodes in Los Angeles, CA were joined by nodes located in San Jose, CA, in exhibiting outage conditions. The outage was cleared at around 8:20 PM EST. Click here for an interactive view.

Additional details from ThousandEyes are available here.

Internet report for January 8-14, 2024

ThousandEyes reported 151 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week of January 8-14. That’s up from 122 the week prior, an increase of 24%. Specific to the U.S., there were 63 outages. That’s up from 58 outages the week prior, an increase of 9%. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages was 83, an increase of 8% compared to the previous week, and in the U.S. they increased by 6%, climbing from 36 to 38 outages.

Public cloud network outages: Globally, cloud provider network outages jumped from 19 to 30 last week. In the U.S., they decreased from 10 to six outages.

Collaboration app network outages: Globally, collaboration app network outages increased from five to seven outages. In the U.S., collaboration app network outages increased from one to four outages.

Two notable outages

On January 14, Zayo Group, a U.S. based Tier 1 carrier headquartered in Boulder, Colorado, experienced an outage that impacted some of its partners and customers across multiple regions including the U.S., Canada, Sweden, and Germany. The outage lasted around 14 minutes, was first observed around 7:10 PM EST, and appeared to initially center on Zayo Group nodes located in Houston, TX. Ten minutes after first being observed, nodes located in Houston, TX, were joined by nodes located in Amsterdam, the Netherlands, in exhibiting outage conditions. This rise of the number of nodes exhibiting outage conditions appeared to coincide with an increase in the number of impacted downstream partners and customers. The outage was cleared around 7:25 PM EST. Click here for an interactive view.

On January 13, Time Warner Cable, a U.S. based ISP, experienced a disruption that impacted a number of customers and partners across the U.S. The outage was first observed at around 12:45 PM EST and appeared to center on Time Warner Cable nodes located in New York, NY.  Fifteen minutes into the outage, the number of nodes located in New York, NY, exhibiting outage conditions increased. The outage lasted 19 minutes and was cleared at around 1:05 PM EST. Click here for an interactive view.

Additional details from ThousandEyes are available here.

Internet report for January 1-7, 2024

ThousandEyes reported 122 global network outage events across ISPs, cloud service provider networks, collaboration app networks and edge networks (including DNS, content delivery networks, and security as a service) during the week ofJanuary 1-7. Over the prior three weeks, all outage categories continuously decreased for two weeks before increasing in the last week. Specific to the U.S., there were 58 outages. Here’s a breakdown by category:

ISP outages: Globally, the number of ISP outages was 77, an increase of 43% compared to the previous week, and in the U.S. they nearly doubled from 20 to 36.

Public cloud network outages: Globally, cloud provider network outages increased from 13 to 19 last week. In the U.S., they increased from 6 to 10.

Collaboration app network outages: Globally, collaboration app network outages increased from one to five outages. In the U.S., collaboration app network outages increased from zero to one. 

Two notable outages

On January 4, Time Warner Cable, a U.S. based ISP, experienced a disruption that impacted a number of customers and partners across the U.S. The outage was first observed at around 10:45 AM EST and appeared to center on Time Warner Cable nodes located in New York, NY.  Five minutes into the outage, the number of nodes located in New York, NY, exhibiting outage conditions increased. The outage lasted 13 minutes and was cleared at around 11:00 AM EST. Click here for an interactive view.

On January 4, Telecom Italia Sparkle, a Tier 1 provider headquartered in Rome, Italy, and part of the Italian-owned Telecom Italia, experienced an outage that impacted many of its downstream partners and customers in multiple regions, including the U.S., Argentina, Brazil, and Chile. The outage lasted 28 minutes in total and was divided into two episodes over a 35-minute period. It was first observed around 4:00 AM EST. The first period of the outage, lasting around 24 minutes, appeared to be centered on Telecom Italia Sparkle nodes located in Miami, FL. Five minutes after appearing to clear, nodes located in Miami, FL, again exhibited outage conditions. The outage was cleared around 4:35 AM EST. Click here for an interactive view.

Additional details from ThousandEyes are available here.

Cloud Computing, Internet Service Providers, Network Management Software, Networking
]]>
https://www.networkworld.com/article/2071380/2024-global-network-outage-report-and-internet-health-check.html 2071380
Google Cloud issue blamed for UniSuper week-long service disruption Wed, 08 May 2024 16:39:30 +0000

An Australian pension fund provider has attributed a week-long service outage to “an unprecedented occurrence” related to provisioning by Google Cloud, its cloud service provider (CSP).

A disruption in UniSuper services that caused members to lose access to online access and mobile apps happened because of “a combination of rare issues at Google Cloud that resulted in an inadvertent misconfiguration during the provisioning of UniSuper’s Private Cloud,” according to an email from Google published in a blog post online by UniSuper.

UniSuper is a financial planning and retirement account provider serving Australia; the outage affected the Australian pension fund for the education and research sectors.

The issues triggered a previously unknown software bug that impacted UniSuper’s systems, causing a service outage that began about a week ago and which will only begin to be remediated starting Thursday, according to the post.

Account access to be restored

By mid-afternoon Australia time Thursday, members will be able to login to their accounts; however, account balances will not be updated yet, though investment and trading has continued as normal throughout the outage and people’s funds were not affected.

UniSuper’s CEO Peter Chun also sent an email to clients Wednesday that was posted online to assure them of the safety of their accounts and the continuity of investment activity during the outage. “This usual investment activity will be reflected in your balance once our systems are fully restored,” according to his email. “As investments have been unaffected by the outage, we have up-to-date investment options performance information on our website for members.”

Calling the problem “an isolated incident,” Google also assured UniSuper members that the outage was not due to a cyber-attack and thus their sensitive data was not exposed to unauthorized entities.

What happened?

The provisioning issue caused a deletion of UniSuper’s Private Cloud subscription, which deleted the cloud in two geographies, one of which was aimed at providing protection against outages and loss, according to Google.

“Restoring UniSuper’s Private Cloud instance has called for an incredible amount of focus, effort, and partnership between our teams to enable an extensive recovery of all the core systems,” according to the email.

UniSuper also had backups in place with an additional service provider, which minimized the loss and is helping the companies during the restoration process.

“Google Cloud sincerely apologizes for the inconvenience this has caused, and we continue to work around the clock with UniSuper to fully remediate the situation, with the goal of progressively restoring services as soon as possible,” the email said.

Outages can cause reputational damage

Cloud and other network outages happen, with the major service providers – including Amazon Web Services, Microsoft Azure and others – all having experienced them at one time or another. For instance, in June 2023, AWS experienced a more than two-hour incident that impacted a number of services on the US East Coast. Microsoft Azure also had a data center outage in Australia in September of last year that prevented users from accessing Azure, Microsoft 365, and Power Platform services for more than 24 hours.

Typically these issues are resolved somewhat quickly, with the UniSuper outage standing out as an exception in terms of its duration, noted Pareekh Jain, CEO of EEIRTrend and Pareekh Consulting. This could harm Google from a reputational standpoint and cause customers to have a lack of trust in the company as a CSP. “The current UniSuper cloud outage on Google Cloud in Australia is taking an unusually long time to resolve, which negatively impacts Google Cloud’s reputation in the region,” he noted.

Such outages also can lead to business disruptions and data loss for clients, which is why many favor a multi-cloud strategy for risk management, Jain added. UniSuper used to split its workloads between Azure and two data centers of its own, but moved a large amount of its workloads to Google Cloud Platform last year.

Cloud Computing, Data Center
]]>
https://www.networkworld.com/article/2099457/google-cloud-issue-blamed-for-unisuper-week-long-service-disruption.html 2099457
IBM Power server targets AI workloads at the edge Wed, 08 May 2024 16:26:59 +0000

IBM has filled out the low end of its Power server portfolio with a 2U rack-mounted server designed for running AI inferencing workloads in remote office or edge locations outside of corporate data centers.

The 1-socket, half-wide, Power10 processor-based system promises a threefold performance increase per core compared to the Power S812 is basically replaces, IBM stated. Running IBM AIX, IBM i, Linux, or VIOS operating systems, the S1012 supports in-core AI inferencing and machine learning with a Matrix Math Accelerator (MMA) feature.

MMA is a feature of Power 10-based servers that handles matrix multiplication operations in hardware, rather than relying solely on software routines; it offers four-times better performance per core for matrix multiplication kernels at the same frequency, according to IBM. Each Power S1012 includes four MMAs per core to support AI inferencing.

Also featured in the new servers is 256 GB of system memory distributed across four DDR4 Industry Standard Dual In-Line Memory Modules (ISDIMM) slots.

“By deploying Power S1012 at the edge, clients can run AI inferencing at the point of data, thus eliminating data transfers,” wrote Steve Sibley, vice president of IBM’s Power product management group, in a blog about the new servers

Sibley cited analysis from Gartner related to AI workloads at the edge: As more organizations embrace AI to further drive business value, Gartner finds that clients in industries such as retail, manufacturing, healthcare, and transportation are deploying workloads at the edge to capitalize on data where it originates.

“By placing data, data management capabilities and analytic workloads at optimal points, ranging all the way to endpoint devices, enterprises can enable more real-time use cases. In addition, the flexibility to move data management workloads up and down the continuum from centralized data centers or from the cloud-to-edge devices will enable greater optimization of resources,” Gartner notes in its March 2024 Market Guide for Edge Computing.

Securing those workloads is also a key feature of the S1012. “To ensure insights remain a competitive advantage and don’t fall into the wrong hands, transparent memory encryption with Power10 secures data in and out of AI models running locally addressing data leaks,” Sibley stated.

The servers feature advanced remote management capabilities to let organizations efficiently manage and monitor their IT environments remotely, to enhance responsiveness and minimize downtime, Sibley stated. “High-availability features such as redundant hardware and failover mechanisms can help ensure continuous operation, all within a compact physical footprint,” Sibley stated.

The S1012 is aimed at small-to-medium users and offers a lower entry point for customers to get into the IBM Power lineup that includes the high-end 240 core, 64Tb Power E1080, and the E1050, which is aimed at memory-intensive workloads and includes up to 48 cores and 16TB memory. 

The IBM Power S1012 will be generally available in a 2U and a tower model from IBM and certified business partners on June 14, 2024.

Data Center, Edge Computing, Servers
]]>
https://www.networkworld.com/article/2099390/ibm-power-server-targets-ai-workloads-at-the-edge.html 2099390
HPE Aruba looks to fight AI threats with AI weapons Tue, 07 May 2024 19:45:42 +0000

HPE Aruba continues to steep its management software with AI features, now by adding network security controls to help IT teams protect AI assets such as large language models from unmanaged device access.

Specifically, HPE will build new AI-powered security observability and monitoring features into its core HPE Aruba Networking Central management platform to help customers protect both AI-based and traditional resources from IoT security risks. The goal is to enhance visibility and identification of devices connected to the network and provide continuous monitoring for unusual or rogue behavior, the vendor stated. In addition, HPE is adding firewall-as-a-service (FWaaS) support to its Aruba security service edge (SSE) package.

Customers will be able to fight AI and other security threats with AI tools and security controls and protect the AI-based resources many enterprises are accumulating, said Jeff Olson, director of product and technical marketing for HPE Aruba. 

“If customers have a number of data scientists building out AI models, and they come to the network with all of this data, and they need to move it or store it in the cloud, and they need to bring some devices with them to do that – they are focused on the problems they are trying to solve with AI, not necessarily the security of the data or the network,” Olson said. 

“We are providing AIOps tools that let the security and networking teams detect anomalies and control security around these AI resources,” Olson said.

On top of that, much AI training data comes from unmanaged IoT devices, which are prone to web-based threats when they communicate with cloud services for updates, telemetry, or other purposes, wrote Jon Green, HPE Aruba’s chief security officer, in a blog about the HPE tools. “In addition, BYOD and line-of-business devices often appear on the network outside the purview of the IT organization and can become compromised without any alert or signal, which can result in entry points for attack and AI poisoning from corrupted or manipulated data,” Green wrote.

New AI support is built into HPE Aruba Networking Central, which uses machine learning models to analyze dynamic device attributes, including traffic patterns and behavioral characteristics such as connection state and network residency, to accurately categorize and identify IoT and traditional devices, Green stated.  

“HPE Aruba Networking Central AIOps has a long history of building automated network activity baselines for troubleshooting and remediation, and now we are using AI to extend that capability to individual devices,” Green stated. “This enables not only more precise, automated fingerprinting to support Zero Trust Security, but also the ability to use behavior baselines to spot anomalies that can indicate compromise and attack.”

In addition to the AI-powered profiling, HPE Aruba is adding other AI-driven capabilities to improve security. For example, HPE Aruba Networking uses AIOps and machine learning models to intelligently hibernate APs during periods of low activity, eliminating potential entry points for malicious activity and reducing attack surface, Green noted.

On the SSE front, Aruba is integrating technology from its 2023 purchase of Axis Security into its SSE, SD-WAN and SASE offerings. According to Gartner, SSE combines several key security functions – including a cloud-access security broker (CASB), secure web gateway, zero-trust network access (ZTNA), and a next-generation firewall – into a cloud-based service to streamline management.

The new Firewall-as-a-Service (FWaaS) fills out HPE Aruba’s SSE package, which already includes ZTNA, CASB and other key SEE components. The FWaaS is tied to a variety of components within the HPE Networking SSE service so security teams can secure and manage networked resources from a single UI and set global policies. centrally, Green stated.

In addition to the FWaaS, Aruba added dashboards within HPE Aruba Networking SSE to enhance visibility into an organization’s security status. Dashboards include views into applications in use, user activity, security events, and ZTNA adoption. Security personnel can use this information to identify shadow IT applications and reduce the associated risk of unauthorized access.

New FWaaS capabilities within HPE Aruba Networking SSE extend protection to wherever data and devices are, without the complexity of an appliance. Joining on-premises security controls delivered by built-in firewalls in HPE Aruba Networking switches, wireless access points, gateways, and WAN appliances, FWaaS completes edge-to-cloud firewall protection by providing policy enforcement in the cloud. And since FWaaS capabilities are integrated with ZTNA, CASB, SWG, and DEM in the HPE Aruba Networking SSE service, security teams can manage all SSE services using a single UI and global policy.

IoT Security, Network Security
]]>
https://www.networkworld.com/article/2098979/hpe-aruba-looks-to-fight-ai-threats-with-ai-weapons.html 2098979
AI features boost Cisco’s Panoptica application security software Tue, 07 May 2024 16:17:28 +0000

Cisco has added a variety of new AI-based security features to its cloud-native security platform that promise to help customers more quickly spot and remediate threats. The features extend the vendor’s Panoptica platform, which is designed to secure cloud applications from development to deployment with a focus on protecting containerized, microservice applications running on platforms such as Kubernetes. 

Panoptica lets customers define and enforce security policies through tools like Terraform, and it monitors application behavior to detect and prevent threats in real time. This includes features found in intrusion detection and prevention systems and specifically designed for cloud-native environments, Cisco says.

A recently added AI Assistant understands plain, everyday language and offers custom assistance in prioritizing, investigating, and remediating a customer’s specific security issues. For example, administrators can ask questions such as “What are my most important vulnerabilities?” and “Help me understand this attack path and how to fix it.” The assistant has awareness and intelligence about an enterprise’s live environment, including all the data Panoptica tracks about its security posture, vulnerabilities, and attack paths, according to Vijoy Pandey, senior vice president of Cisco’s Outshift advanced development group. 

Adding to Panoptica’s current level of AI support, Cisco integrated OpenChat’s large language model GPT-4 in a feature called GenAI Dynamic Remediation. With this support, Panoptica can derive targeted remediations based on the security risk context presented by the system’s Attack Path Analysis engine. It “provides step-by-step instructions on how to apply the controls using CLI, code snippets, and Terraform tailored to the unique characteristics of each attack path,” Pandey wrote in a blog about the new features.

“Panoptica integrated GPT-4 with our graph engine, enabling it to present users with in-depth, tailored remediations for each detected attack path, including remediation guidance tailored to each of the critical points of infiltration: network exposure, workload at risk, and identity exposure,” Pandey wrote. “This rapidly decreases response time by giving teams sample code that gets right to the source of the issue. No more wasted time figuring out how to solve the problem; a simple code sample shows you exactly how you can fix it right now.”

Another new AI-based feature, Smart Cloud Detection & Response (CDR), offers security teams a head start in detecting attacks, continuously monitoring security events as they occur, and correlating them with insights and information so that they can respond, Pandey stated. Based on Cisco internal research, Smart CDR provides forensic information about the attack. “Every bad actor has an intent, and our job is to help describe what’s going on by painting a picture of the attack story,” Pandey wrote.

Smart CDR detects threats in real time and promptly notifies security teams, Pandey stated. “Most competitors stop at threat detection, but we go further, stitching these threats together to describe the attacker’s intent,” Pandey wrote. “Our approach involves generating synthetic attack simulations to train our ML models to detect attacks like ransomware, data exfiltration, crypto-jacking, container escape, and data destruction.”

Lastly, Cisco added the ability to more easily create, manage, and enforce security policies across a multicloud environment via a new feature called Security Graph Query. The feature integrates with the system’s policies engine to let customers enforce security policies directly from the Security Graph Query Builder and Query Library, Pandey stated.

The Security Graph Query Builder lets users build customized queries that combine data and insights from Panoptica’s different security modules, such as cloud security posture visibility, runtime workload protection, and Attack Path Analysis for analyzing potential attack vectors, according to Cisco. The idea is to offer unified view of an organization’s cloud assets, security posture, vulnerabilities, and threats across their entire cloud-native application stack. This lets security teams identify risks, investigate issues, and take appropriate actions, according to Cisco.

“The feature is a comprehensive search and visualization tool that aggregates data across multiple cloud providers, code repositories, APIs, SaaS applications, and Kubernetes clusters,” Pandey stated.

“It utilizes queries crafted for assets and their relationships and security insights such as attack paths, risk findings, and vulnerabilities,” he wrote. “The goal is to streamline policy creation, improve security compliance, and make policy management more efficient and data-driven.”

Pandey listed a few use cases, including:

  • Proactive threat hunting: search for signs of compromise and emerging threats by constructing custom queries that indicate potential security risks.
  • Contextual analysis: understanding the context of an event or entity within the graph allows security teams to make more informed decisions.
  • Resource optimization: security teams can use the insights from the graph to optimize resource allocation, focusing efforts on areas of the network that are most vulnerable or frequently targeted.

The Panoptica announcement was timed with the ongoing RSA Conference 2024, where Cisco also announced plans to integrate Splunk’s enterprise security technology (gained in its recent $28 billion Splunk acquisition) with Cisco’s extended detection and response (XDR) service.

Cloud Computing, Network Security
]]>
https://www.networkworld.com/article/2098930/ai-features-boost-ciscos-panoptica-application-security-software.html 2098930
Red Hat extends Lightspeed generative AI tool to OpenShift and Enterprise Linux Tue, 07 May 2024 14:04:56 +0000

Red Hat’s generative AI-powered Lightspeed tool was first announced last year for the Red Hat Ansible automation platform. This morning, as the Red Hat Summit kicks off in Denver, the company announced that it will be extended to Red Hat Enterprise Linux and Red Hat OpenShift.

OpenShift, Red Hat’s Kubernetes-powered hybrid cloud application platform, will be getting it late this year. Red Hat Enterprise Linux Lightspeed is now in its planning phase, with more information coming soon. (At the Summit, Red Hat also announced a new ‘policy as code’ capability for Ansible.)

“This will bring similar genAI capabilities to both of those platforms across the hybrid cloud,” says Chuck Dubuque, senior director for product marketing at Red Hat OpenShift. Users will be able to ask questions in simple English and get usable code as a result, or suggestions for specific actions, he says, and the tool is designed to address skills gaps and the increasing complexities in enterprise IT.

“More seasoned IT pros can use Red Hat Lightspeed to extend their skills by using Red Hat Lightspeed as a force multiplier,” Dubuque says. “It can help quickly generate potential answers to niche questions or handle otherwise tedious tasks at scale. It helps IT organizations innovate and build a stronger skilled core while helping further drive innovation.”

The vision is that Red Hat Linux will help companies address this skills gap and put more power in the hands of organizations who want to use Linux, automation, and hybrid clouds but don’t have the skills in house, he says, “or endless funds to enlist said skills.”

Other generative AI platforms can also answer questions and write code, but those are general-purpose LLMs, he says. “We built a purpose-driven model to solve unique challenges for IT,” he says. “The skill sets required for programming and development haven’t always been widely accessible to the entire talent pool or businesses with limited resources.”

Red Hat didn’t create the core foundation model, however.

Take, for example, Ansible Lightspeed, which became generally available last November. Ansible Lightspeed is based on IBM’s WatsonX Code Assistant, which, in turn, is powered by the IBM Granite foundation models, according to Sathish Balakrishnan, vice president and general manager at the Red Hat Ansible Business Unit,

It is then further trained on data from Ansible Galaxy, an open-source repository of Ansible content covering a variety of use cases, he says, and further fine-tuned with additional expertise from Red Hat and IBM.

For example, to create and edit an Ansible Playbook and rules, users can type in a question and get an output that’s translated into YAML content. That streamlines role and playbook creation, Balakrishnan says. This helps companies translate subject matter expertise into best practices that can scale across teams, standardize and improve quality, and adhere to industry standards.

“The service also helps safeguard private data through data isolation, so sensitive customer information remains untouched and possible data leaks are minimized,” he says.

Hundreds of customers are already using Lightspeed Ansible to generate tasks, says Dubuque. “And we’re expanding it to build full playbooks,” he says. “But Red Hat Lightspeed is bigger than just Ansible. We’re infusing generative AI into all our platforms.”

So, for example, with OpenShift Lightspeed, users will have an assistant integrated right into the OpenShift console so they can ask questions in plain English about the product or get help with troubleshooting. “Our goal is to increase productivity and efficiency,” he says.

However, we’re still in the early days of generative AI and AI assistants, says IDC analyst Stephen Elliot, so companies do need to be careful about how they use the technology. “But it’s a safe assumption that most of these models are going to get better and smarter,” Elliot says.

Linux, Network Management Software, Networking, Servers
]]>
https://www.networkworld.com/article/2098808/red-hat-extends-lightspeed-generative-ai-tool-to-openshift-and-enterprise-linux.html 2098808
Red Hat introduces ‘policy as code’ for Ansible Tue, 07 May 2024 14:03:06 +0000

Red Hat Ansible’s new “policy as code” capabilities will help users of the infrastructure automation platform to increase efficiency, reduce human error and improve the ability to meet governance, compliance, security and cost objectives, the company announced this morning at the Red Hat Summit in Denver. And it will help position Red Hat for an increasingly AI-dominated future.

The new capability, a tech preview of which is slated for availability “in the coming months,” will help enforce policies and compliance across hybrid cloud estates that increasingly include a varied and growing number of AI applications, the company announced. It enables high-level strategies for automation maturity “to better prepare organizations for sprawling infrastructure in support of scaling AI workloads,” Red Hat says.

The problem, says Sathish Balakrishnan, vice president and general manager for Red Hat’s Ansible business unit, is that as AI scales the capabilities of individual systems beyond what humans can manage, the challenge of maintaining IT infrastructure grows exponentially.

And even before infrastructure is focused on AI workloads and services, mission critical systems are still impacted by compliance mandates for security, performance, and auditability, he says. Even today, implementing these policies requires time, attention, and cross-functional team collaboration and documentation. “Mistakes can be costly,” he adds.

According to Balakrishnan, the enterprises who will benefit the most from the new capability are those looking to take advantage of AI. “In many ways, AI is the final stage of the automation adoption journey,” he says. “In the context of enterprise IT ops, AI means machines automating processes, machines connecting infrastructure and tools to make them more efficient, and machines making decisions to improve resiliency and reduce costs.”

That’s a shift in how IT ops is conducted, he says, and nearly every tool or platform in the environment is already introducing new AI capabilities. “This will provide you with vast new amounts of data, insights and intelligence,” he says. “But to make it all actionable, you need to be able to orchestrate it all – and you need to be able to harness the new intelligence to optimize your stack.”

So, for any organization looking to leverage AI, automation is mission-critical, he says.

Say, for example, a company is using the AI-powered Ansible Lightspeed service to accelerate automation development. If “policy as code” is infused from the start, content creators can write code that automatically maintains mandated compliance requirements, he says, “greatly reducing the impact of skills gaps and human error in IT operations.”

At British banking and insurance company NatWest Group, for example, adding policy capability to automation will allow for increased compliance and adherence to regulations, says Baljinder Kang, the company’s director of enterprise engineering, in a statement. “We think this is necessary as we look to add AI capabilities to continue to enhance our tooling to drive increased value in our solutions to meet our customer needs,” he says.

The big public clouds – hyperscalers like AWS, Microsoft Azure, and Google Cloud Platform – are also moving in the direction of policy as code, says IDC analyst Stephen Elliot.

“They have lots of point tools,” Elliot says. “Typically, customers tell us that they have Cloud Watch, they have CloudTrail, they have AWS Systems Manager. The tools are either free or almost free. They’re good enough, cheap enough. They need them for visibility into the AWS environment.”

But once enterprises move outside of a particular cloud, they need to use other tools, such as Ansible or Terraform. “Red Hat is all about supporting all the clouds,” Elliot says. “Ansible is all about automating things across all the clouds — part of the value is multi-cloud support. Most of the public cloud providers only care about their own cloud, and for very good reasons.”

However, enterprises are still far from being able to rely fully on automation for policy enforcement, he says. With AI-powered automation, in particular, “the hype is way ahead of what we’re capable of,” Elliot says. “There’s a lot of experimentation this year, and there’s a lot of money put into these AI experiments and AI use cases, but this is going to take a little time.”

Still, companies should be looking at this now, he says. “You’ve got to start somewhere,” he says. “You have to think about how you would define a use case for policy as code.”

With Red Hat Ansible, there are so many users and companies using this that many companies will be able to figure out how to drive some efficiencies and cost improvements.

“Speed is the ultimate competitive weapon,” Elliot says. “If you’re not talking about it, you’re already losing. And if you are talking about it, make some decisions about your investments.”

Linux, Network Management Software
]]>
https://www.networkworld.com/article/2098823/red-hat-introduces-policy-as-code-for-ansible.html 2098823
Riverbed launches AI-powered observability platform Tue, 07 May 2024 13:52:20 +0000

Riverbed today announced its Riverbed Platform that includes observability capabilities to provide enterprise network managers with visibility into blind spots around public cloud, zero trust, and SD-WAN architectures as well as remote work environments.

The Riverbed Platform enables IT organizations to collect, analyze, automate, and report on data aggregated across complex environments to optimize digital experiences for end users and customers. Using AIOps (artificial intelligence for IT operations) to conduct correlation and analysis, the Riverbed Platform can conduct root-cause identification and kick off automated remediations.

Riverbed launched the platform with about 35 pre-built application and software integrations, which the company says reflects the open platform and will more quickly enable IT teams to gain value in their existing environments.

“Customers are challenged with improving the digital experience, simplifying their management environment and implementing AI that works and scales,” said Dave Donatelli, CEO at Riverbed, in a statement. “To address this, Riverbed has invested in our core competencies of data collection and AI, and today we’re launching the most advanced AI-powered observability platform to optimize digital experiences, along with solutions that provide new levels of visibility into network blind spots and enterprise-owned mobile devices.”

Among the updates, Riverbed announced:

  • Riverbed Aternity Mobile: A mobile monitoring tool that increases employee productivity by proactively identifying performance issues on enterprise-provided mobile devices and taking remediation actions.
  • Riverbed NPM+: The first in a series of SaaS-delivered network performance management (NPM) services that overcome traditional network blind spots by extending packet visibility to network locations where monitoring was previously not possible.
  • Riverbed NetProfiler: A tool that provides real-time visibility into network traffic and application performance that now is also able to monitor SD-WAN health and performance.

Riverbed also updated its SaaS-based AIOps service Riverbed IQ 2.0, which uses AI-powered automation to contextualize and correlate real data across IT to prevent, identify, and resolve performance and other issues. Riverbed IQ 2.0 filters out noise and reduces alerts to those only most relevant to IT by using Riverbed Data Store, which connects data sources into a data repository, and Topology Viewer, which generates a dynamic map of connected devices and dependencies in an environment.

With the observability enhancements, Riverbed unveiled its Riverbed Unified Agent, which promises to streamline the deployment and management of observability products with a single software agent. The unified agent will enable enterprise network managers to adopt Riverbed modules without deploying more software or having to maintain an additional component to collect and report on data across the environment.

“The universal agent reduces the amount of administrative overhead customers would have if they were using both NPM+ and Aternity,” says Shamus McGillicuddy, research director for the network management practice at Enterprise Management Associates (EMA).

“Right now, the main alternative to NPM+ is a synthetic network monitoring solution, such as those offered by Cisco ThousandEyes, Broadcom AppNeta, and Catchpoint. Let’s say Riverbed Aternity customers want to adopt one of these competitors for a synthetic solution. To get the client-side perspective from those alternative solutions, customers would have to deploy a second agent on devices,” McGillicuddy says.

“At some point, you want to limit the number of agents you’re running on a device to mitigate management complexity, but also to avoid overtaxing resources of client devices. The Universal Agent streamlines agent management and agent overhead.”

With a unified agent, Riverbed is reducing the overhead needed to gain visibility across myriad devices while also reporting on data needed to improve digital experiences. The Riverbed Unified Agent installs on managed devices, it becomes an enabler for module features, and the technology can be automatically updated without human intervention.

By providing visibility into an extended environment, Riverbed is giving network managers a means to better manage cloud applications and services as well as remote workers.

“Riverbed is making improvements to existing products that customers are already using (like the NetProfiler NPM solution and the AIOps solution (IQ). It’s also introducing a new SaaS-based product (NPM+) that addresses a critical need for organizations that support cloud apps and remote workers. It also helps with ZTNA observability,” EMA’s McGillicuddy explains. “NPM+ provides a compelling agent-based digital experience management capability.”

These capabilities align with what enterprise IT teams say they need and plan to invest in, according to McGillicuddy. “Last year my research found that 87% of NetOps teams have allocated budget to improve how they manage experience for remote workers. Also, the updates to Riverbed IQ improve how customers can create and run automated workflows in response to AI-derived insights. Last year, 57% of NetOps teams told me they want no-code interfaces such as this for creating runbook automation on their AIOps platforms.”

Riverbed’s news is promising for customers, as it is providing real data that can provide insights into the performance of network components, applications, and services. Still, considering the complex environments network teams are tasked to monitor and optimize, Riverbed and others could be doing more to support customers. For instance, there’s opportunity for Riverbed and its competitors to add integrations and support for the technologies organizations are investing in now, such as secure access service edge (SASE):

“If it works as advertised, NPM+ should produce quality telemetry. It’s passively monitoring real network connections, which means IT will get data about network and application sessions,” McGillicuddy says. “Riverbed should expand the number of SD-WAN solutions supported by NetProfiler. Also, they should position NPM+ next to NetProfiler as a solution that provides SD-WAN and SASE observability. NetOps teams struggle with visibility into SASE points of presence, and NPM+ has the potential to solve that issue. A combined NetProfiler/NPM+ solution could be a powerful toolset for SD-WAN/SASE operations.”

The Riverbed Platform, NPM+, NetProfiler, and all the capabilities announced with this launch are generally available now.

Network Management Software, Network Monitoring
]]>
https://www.networkworld.com/article/2098781/riverbed-launches-ai-powered-observability-platform.html 2098781