A minor bug can cause a system crash after 1,044 days of uninterrupted uptime. Be sure to reboot before then. Semiconductors, especially CPUs, are immensely complex creations all done at the microscopic level. That there aren’t more bugs, for lack of a better word, is a testament to the efforts that these chipmakers put in to delivering solid products. But occasionally, something slips by. AMD has issued an alert that an older processor line has a minor error. The problem exists in its Epyc 7002 line, code-named Rome, which was released three years ago. The bug, first noted on a Reddit thread, says that servers running Rome-era chips will hang after 1,044 days of uptime or nearly three years. There is no way to reset the server other than to reboot. AMD says it will not fix the issue. “AMD has successfully provided a remedy for an isolated challenge regarding 2nd Gen AMD EPYC processors where for some customers, a core within the processor could hang if running consistently for an extended period of time,” a company spokesperson said via email. The bug is in what’s known as the C6 Sleep State. To save energy when the CPU is idle, it can go into a low-power mode. CPUs have several power modes, which are collectively called “C-states” or “C-modes.” Intel first introduced it with the 486 processor, so the idea is hardly new. These C-state modes start at C0, which is the normal CPU operating mode. The higher the C number is, the deeper into sleep mode the CPU goes and the more signals are turned off. The deeper the sleep state, the more time the CPU needs to fully wake up. With this bug, once a CPU goes into C6 past the 1,044-day mark, it gets stuck and a reboot is required. The fix is either reboot the server before the three-year mark or disable the sleep state that causes the bug. That this bug even surfaced is testament to the CPU’s performance; three years of uninterrupted uptime is remarkable. You might think server updates would have dictated a reboot along the way, but then again, the Linux kernel can be patched without a reboot. Significant CPU bugs do happen but not very often, and this certainly isn’t one of them. Related content news High-bandwidth memory nearly sold out until 2026 While it might be tempting to blame Nvidia for the shortage of HBM, it’s not alone in driving high-performance computing and demand for the memory HPC requires. By Andy Patrizio May 13, 2024 3 mins CPUs and Processors High-Performance Computing Data Center news CHIPS Act to fund $285 million for semiconductor digital twins Plans call for building an institute to develop digital twins for semiconductor manufacturing and share resources among chip developers. By Andy Patrizio May 10, 2024 3 mins CPUs and Processors Data Center news HPE launches storage system for HPC and AI clusters The HPE Cray Storage Systems C500 is tuned to avoid I/O bottlenecks and offers a lower entry price than Cray systems designed for top supercomputers. By Andy Patrizio May 07, 2024 3 mins Supercomputers Enterprise Storage Data Center news Lenovo ships all-AMD AI systems New systems are designed to support generative AI and on-prem Azure. By Andy Patrizio Apr 30, 2024 3 mins CPUs and Processors Data Center PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe