Multi-disk panic due to errors on a single path and doesn't switch to secondary path
Applies to
- FAS/AFF
- ONTAP 9
Issue
pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'Bad device state completion: status 0x49, dev 0a.00.18, cdb 0x28:761afb00:0008', 'adapterName': '0a'}
pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'Bad device state completion: status 0x49, dev 0a.00.22, cdb 0x28:761afc38:0008', 'adapterName': '0a'}
pmcsas_admin_0: sas.port.down:debug]: SAS port "0a" went down.
...
config_thread: sk.panic:alert]: Panic String: aggr aggr0_mtpclus01_15: raid volfsm, fatal multi-disk error.. Raid type - raid_dp Group name plex0/rg0 state NORMAL. 5 disks failed in the group. Disk 0a.00.12P3 Shelf 0 Bay 12 [NETAPP X358_TPM5V3T8ATE NA53] S/N [Y0S0A0FATS1FNP003] UID [68CE38EE:213FAF40:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk 0a.00.14P3 Shelf 0 Bay 14 [NETAPP X358_TPM5V3T8ATE NA53] S/N [Y0U0A01NTS1FNP003] UID [68CE38EE:214071F0:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk 0a.00.16P3 Shelf 0 Bay 16 [NETAPP X358_TPM5V3T8ATE NA53] S/N [Y0S0A0ALTS1FNP003] UID [68CE38EE:213F75EC:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk /aggr0_mtpclus01_15/plex0/rg0/0a.00.18P3 Shelf 0 Bay 18 [NETAPP X358_TPM5V3T8ATE NA53] S/ in SK process config_thread on release 9.7P12 (C)
pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'Bad device state completion: status 0x49, dev 0d.00.5, cdb 0x88:0000000120e6b030:00000008', 'adapterName': '0a'}
pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'Bad device state completion: status 0x49, dev 0d.00.3, cdb 0x28:eae1c7f8:0008', 'adapterName': '0a'}
pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'No device or failure to OPEN (status 0xf) -- delaying: dev 0d.00.13, cdb 0x88:00000001b6c83460:00000008', 'adapterName': '0a'}
scsi_cmdblk_strthr_admin: scsi.cmd.adapterHardwareErrorEMSOnly:error]: Disk device 0d.00.13: Adapter detected hardware error: HA status 0xb: cdb 0x88:00000001b6c83460:00000008.
pmcsas_admin_0: sas.port.down:debug]: SAS port "0d" went down.
...
config_thread: cf.multidisk.fatalProblem:info]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr aggr0_mtpclus01_16: raid volfsm, fatal multi-disk error.. Raid type - raid_dp Group name plex0/rg0 state NORMAL. 5 disks failed in the group. Disk 0d.00.1P3 Shelf 0 Bay 1 [NETAPP X358_TPM5V3T8ATE NA53] S/N [Y0R0A0LGTS1FNP003] UID [68CE38EE:213F5020:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk 0d.00.3P3 Shelf 0 Bay 3 [NETAPP X358_TPM5V3T8ATE NA53] S/N [Y0S0A0BFTS1FNP003] UID [68CE38EE:213F7660:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk 0d.00.5P3 Shelf 0 Bay 5 [NETAPP X358_TPM5V3T8ATE NA53] S/N [Y0U0A01DTS1FNP003] UID [68CE38EE:214071CC:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk /aggr0_mtpclus01_16/plex0/rg0/0d.00.7P3 Shelf 0 Bay 7 [NETAPP X358_TPM5V3T8ATE NA53] S/N [Y0S0A.
