The speed of Meta's Research Super Computer would dwarf that of the current world's fastest supercomputer. Credit: Gerd Altmann Facebook’s parent company Meta said it is building the world’s largest AI supercomputer to power machine-learning and natural language processing for building its metaverse project. The new machine, called the Research Super Computer (RSC), will contain 16,000 Nvidia A100 GPUs and 4,000 AMD Epyc Rome 7742 processors. It has 2,000 Nvidia DGX-A100 nodes, with eight GPU chips and two Epyc microprocessors per node. Meta expects to complete construction this year. RSC is already partially built, with 760 of the DGX-A100 systems deployed. Meta researchers have already started using RSC to train large models in natural language processing (NLP) and computer vision for research with the goal of eventually training models with trillions of parameters, according to Meta. “Meta has developed what we believe is the world’s fastest supercomputer. We’re calling it RSC for AI Research SuperCluster, and it’ll be complete later this year. The experiences we’re building for the metaverse require enormous compute power (quintillions of operations/second!) and RSC will enable new AI models that can learn from trillions of examples, understand hundreds of languages, and more,” said CEO Mark Zuckerberg in an emailed statement. RSC is expected to hit a peak performance of 5 exaFLOPS at mixed precision processing, both FP16 and FP32, which would rocket it to the top of the Top500 supercomputer list whose top performing supercomputer can hit 442 Pflop/s. It is being built in partnership with Penguin Computing, a specialist in HPC systems. Meta is not disclosing where the system is located. “RSC will help Meta’s AI researchers build new and better AI models that can learn from trillions of examples; work across hundreds of different languages; seamlessly analyze text, images, and video together; develop new augmented reality tools; and much more,” Kevin Lee, a technical program manager, and Shubho Sengupta, a software engineer, both at Meta, wrote in a blog post. “We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together,” they wrote. In addition to all of the processing power, RSC also has to 175 petabytes in Pure Storage FlashArray, 46 petabytes in a cache storage, and 10 petabytes of Pure’s object storage equipment. RSC is estimated to be nine times faster than Meta’s previous research cluster, made up of 22,000 of Nvidia’s older generation V100 GPUs, and 20 times faster than its current AI systems. Meta does not plan to retire the old system. The company is focused on building learning models for automated tasks focused around content. It wanted this infrastructure in order to train models with more than a trillion parameters on data sets as large as an exabyte, with the goal of getting its arms around all the content generated on its platform. “By doing this, we can help advance research to perform downstream tasks such as identifying harmful content on our platforms as well as research into embodied AI and multimodal AI to help improve user experiences on our family of apps. We believe this is the first time performance, reliability, security, and privacy have been tackled at such a scale,” Lee and Sengupta wrote. Related content news High-bandwidth memory nearly sold out until 2026 While it might be tempting to blame Nvidia for the shortage of HBM, it’s not alone in driving high-performance computing and demand for the memory HPC requires. By Andy Patrizio May 13, 2024 3 mins CPUs and Processors High-Performance Computing Data Center news CHIPS Act to fund $285 million for semiconductor digital twins Plans call for building an institute to develop digital twins for semiconductor manufacturing and share resources among chip developers. By Andy Patrizio May 10, 2024 3 mins CPUs and Processors Data Center news HPE launches storage system for HPC and AI clusters The HPE Cray Storage Systems C500 is tuned to avoid I/O bottlenecks and offers a lower entry price than Cray systems designed for top supercomputers. By Andy Patrizio May 07, 2024 3 mins Supercomputers Enterprise Storage Data Center news Lenovo ships all-AMD AI systems New systems are designed to support generative AI and on-prem Azure. By Andy Patrizio Apr 30, 2024 3 mins CPUs and Processors Data Center PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe