Cisco says manufacturing errors are to blame for flaws in its 16GB, 32GB, and 64GB dual in-line memory modules (DIMM). Credit: Larry White / Pixabay Cisco is urging customers to replace flawed memory sticks in some of its Unified Computing System (UCS) servers before they fail. The problem is caused by a manufacturing error in 24 dual in-line memory modules (DIMM) that exhibit persistent correctable memory errors that if left in place could knock the servers offline. The problem is found in 16GB, 32GB, and 64GB memory DIMMs. Cisco describes the flaws as manufacturing deviations that affect memory modules used to make up the DIMMs. All of the problem parts were manufactured during the middle-to-end of 2020, according to a Cisco alert. A symptom of the problem is that the DIMMs will exhibit persistent correctable memory errors. “If left untreated, the DIMMs might eventually encounter an uncorrectable memory event. If encountered during runtime, uncorrectable errors will cause a sudden unexpected server reset. If encountered during Power-On Self-Test (POST), the DIMM will be mapped out and the total available memory reduced. In some cases a boot error might be seen,” the alert states. The company noted that operating system features and memory Reliability, Availability and Serviceability (RAS) features might mask the extent of the correctable errors, so customers are advised not to judge their exposure based on a lack of error reports. Instead, they should check whether the serial number of the suspect part has been flagged. The process is described in the Cisco alert, which lists the potentially faulty products. Replacement parts are available from Cisco. Cisco did not identify the maker of the defective memory modules, and declined to answer my query as well. The only thing it would say is that the memory was manufactured in mid to late 2020. However, SK Hynix, the South Korean memory maker that does manufacture memory modules used in Cisco UCS servers admitted to manufacturing problems during its most recent earnings call. During that call, an unidentified company representative stated that if changed its manufacturing process beginning in mid-2020 with some unintended side effects. “Some of the products that were produced at this particular time had been reportedly suffering some quality degradation since about one year ago. So we have been receiving reports of them sometime in the middle of last year,” the unidentified representatives said. Related content news High-bandwidth memory nearly sold out until 2026 While it might be tempting to blame Nvidia for the shortage of HBM, it’s not alone in driving high-performance computing and demand for the memory HPC requires. By Andy Patrizio May 13, 2024 3 mins CPUs and Processors High-Performance Computing Data Center news CHIPS Act to fund $285 million for semiconductor digital twins Plans call for building an institute to develop digital twins for semiconductor manufacturing and share resources among chip developers. By Andy Patrizio May 10, 2024 3 mins CPUs and Processors Data Center news HPE launches storage system for HPC and AI clusters The HPE Cray Storage Systems C500 is tuned to avoid I/O bottlenecks and offers a lower entry price than Cray systems designed for top supercomputers. By Andy Patrizio May 07, 2024 3 mins Supercomputers Enterprise Storage Data Center news Lenovo ships all-AMD AI systems New systems are designed to support generative AI and on-prem Azure. By Andy Patrizio Apr 30, 2024 3 mins CPUs and Processors Data Center PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe