AFF A800 Multi-Disk Panic after node disables PCI links to disk
Applies to
Issue
- Controller is taken over unexpectedly with the following reason:
Fri Oct 07 11:52:42 +0000 [node-01: cf_main: cf.fsm.takeover.mdp:alert]: Failover monitor: takeover attempted after multi-disk failure on partner
- The following errors are seen on the affected node
Fri Oct 07 11:51:14 +0000 [node-01: config_failed_disk: callhome.disks.missing:error]: Call home for MULTIPLE DISKS MISSING
Fri Oct 07 11:50:12 +0000 [node-02: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 42 due to excessive errors.
Fri Oct 07 11:51:29 +0000 [node-02: SKL cerror: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO7: RPT(215,0,0): SecStatus(RcvMstAbt), DevStatus(Corr), CorrErr(RNRov,RpTim); PLX PCIE 9765 switch on Controller, Br[9765](216,0,0): DevStatus(Corr), CorrErr(Rcvr,BTLP,BDLLP,RNRov,RpTim), BadTLP(8766), BadDLLP(28234), RcvErr(P0(255)); '}
Fri Oct 07 11:51:41 +0000 [node-02: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 43 due to excessive errors.
- The affected disks are not failed on the partner node which took over the affected node
