Node down with process on cpu31 hung (CCMA-Worker) message and DIMM CECC error
Applies to
AFF-A700
Issue
- Node panic with following error.
PANIC : process on cpu31 hung (CCMA-Worker) for 5007 milliseconds!
- Node can not boot up with the following error after panic reboot.
Initializing System Memory ...
MEMORY WARNING: Major Code 0x30 Minor Code 0x13 Dimm 1
MEMORY WARNING: Major Code 0x30 Minor Code 0x13 Dimm 1
MEMORY WARNING: Major Code 0x30 Minor Code 0x15 Dimm 1
MEMORY WARNING: Major Code 0x30 Minor Code 0x15 Dimm 1
MEMORY WARNING: Major Code 0x30 Minor Code 0x15 Dimm 2
MEMORY WARNING: Major Code 0x30 Minor Code 0x15 Dimm 2
DIMM:1 mapped out. BIOS MRC mapped out DIMM. Major / Minor Error Code: 0x31 / 0x25
Complete channel mapped out.
DIMM in slot 1 is disabled
DIMM in slot 2 is disabled
System DIMM configuration is not supported by AFF-A700
Halting...
- Replace mapped out DIMM-1 and node boot up. 5 days later the node detects CECC error and panic again.
HA Group Notification from node-1 (Health Monitor process nphm: CriticalCECCCountMemErrAlert[DIMM-1]) ALERT
HA Group Notification from node-1 (Health Monitor process nphm: CriticalCECCCountMemErrAlert[DIMM-2]) ALERT
PANIC: process on cpu5 hung (CCMA-Worker) for 5002 milliseconds!
- After giveback and replace of DIMM-1&DIMM-2 the node back online.
