Skip to main content
NetApp Stage KB

Multi-disk panic due to errors on a single path and doesn't switch to secondary path

Views:
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
HW
Last Updated:

Applies to

  • FAS/AFF
  • ONTAP 9

Issue

During normal operation, SAS errors on a path to disk led to that path being disabled but ONTAP did not switch to the other path, causing a multi-disk panic.
 
Example EMS from both nodes: 

pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'Bad device state completion: status 0x49, dev 0a.00.18, cdb 0x28:761afb00:0008', 'adapterName': '0a'}
pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'Bad device state completion: status 0x49, dev 0a.00.22, cdb 0x28:761afc38:0008', 'adapterName': '0a'}
pmcsas_admin_0: sas.port.down:debug]: SAS port "0a" went down.
...
config_thread: sk.panic:alert]: Panic String: aggr aggr0_mtpclus01_15: raid volfsm, fatal multi-disk error..  Raid type - raid_dp Group name plex0/rg0 state NORMAL. 5 disks failed in the group. Disk 0a.00.12P3 Shelf 0 Bay 12 [NETAPP   X358_TPM5V3T8ATE NA53] S/N [Y0S0A0FATS1FNP003] UID [68CE38EE:213FAF40:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk 0a.00.14P3 Shelf 0 Bay 14 [NETAPP   X358_TPM5V3T8ATE NA53] S/N [Y0U0A01NTS1FNP003] UID [68CE38EE:214071F0:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk 0a.00.16P3 Shelf 0 Bay 16 [NETAPP   X358_TPM5V3T8ATE NA53] S/N [Y0S0A0ALTS1FNP003] UID [68CE38EE:213F75EC:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk /aggr0_mtpclus01_15/plex0/rg0/0a.00.18P3 Shelf 0 Bay 18 [NETAPP   X358_TPM5V3T8ATE NA53] S/ in SK process config_thread on release 9.7P12 (C)

 

pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'Bad device state completion: status 0x49, dev 0d.00.5, cdb 0x88:0000000120e6b030:00000008', 'adapterName': '0a'}
pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'Bad device state completion: status 0x49, dev 0d.00.3, cdb 0x28:eae1c7f8:0008', 'adapterName': '0a'}
pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'No device or failure to OPEN (status 0xf) -- delaying: dev 0d.00.13, cdb 0x88:00000001b6c83460:00000008', 'adapterName': '0a'}
scsi_cmdblk_strthr_admin: scsi.cmd.adapterHardwareErrorEMSOnly:error]: Disk device 0d.00.13: Adapter detected hardware error: HA status 0xb: cdb 0x88:00000001b6c83460:00000008. 
pmcsas_admin_0: sas.port.down:debug]: SAS port "0d" went down.
...
config_thread: cf.multidisk.fatalProblem:info]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr aggr0_mtpclus01_16: raid volfsm, fatal multi-disk error..  Raid type - raid_dp Group name plex0/rg0 state NORMAL. 5 disks failed in the group. Disk 0d.00.1P3 Shelf 0 Bay 1 [NETAPP   X358_TPM5V3T8ATE NA53] S/N [Y0R0A0LGTS1FNP003] UID [68CE38EE:213F5020:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk 0d.00.3P3 Shelf 0 Bay 3 [NETAPP   X358_TPM5V3T8ATE NA53] S/N [Y0S0A0BFTS1FNP003] UID [68CE38EE:213F7660:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk 0d.00.5P3 Shelf 0 Bay 5 [NETAPP   X358_TPM5V3T8ATE NA53] S/N [Y0U0A01DTS1FNP003] UID [68CE38EE:214071CC:500A0981:00000003:00000000:00000000:00000000:00000000:00000000:00000000] error: disk operation timed out. Disk /aggr0_mtpclus01_16/plex0/rg0/0d.00.7P3 Shelf 0 Bay 7 [NETAPP   X358_TPM5V3T8ATE NA53] S/N [Y0S0A.

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.