Node encounters uncorrectable machine check error despite replacing the Flash Cache card
Applies to
- ONTAP 9
- FAS 8300
Issue
- Node panics due to an uncorrectable machine check error as shown below:
Uncorrectable Machine Check Error at CPU11. SKL_IIO Error: STATUS<0xfb80000000000e0b>(VALID,OVERFLOW,UC,EN,MISCV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MSCOD(0),MCACOD(0xe0b))MISC<0x0000 000080000000>(UCR_BUS_LOG(128),UCR_DEVICE_LOG(0),UCR_FUNCTION_LOG(0),UCR_SEGMENT_LOG(0))IIO Machine Check from device(s):RPT(128,0,0):ErrSrcID(CorrSrc(0x8100),UCorrSrc(0x8250)), PLX PCIE 8733 switch on Controller, Br[8733](130,10,0): Link down. ,. in process idle: cpu11 on release 9.11.1P8 (C)- The analysis as per the KB: How to troubleshoot PCI/NMI, UMCE, and nested machine check exception panics suggests that the NVMe flash cache module is faulty.
- The node panics again with the same panic string as before, even after replacing the affected flash cache card.
