Shelf Failure caused node panic
Applies to
- ONTAP 9
- AFF/FAS/ASA
- DS460/DS224
Issue
- Node panicked with multi-disk panic error:
PANIC : aggr aggr1_SATA: raid volfsm, fatal multi-disk error..Disk 0b.10.2 Shelf 10 Drawer 1 Slot 2 Bay 2 [NETAPP X375_WVELE04TA07 NA01] S/N [XXXXXXX] UID [5000CCA0:C415E3F0:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist.Disk 0b.10.4 Shelf 10 Drawer 1 Slot 4 Bay 4 [NETAPP X375_XXXXXXX NA01] S/N [XXXXXXX] UID [5000CCA0:C415BA70:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist.Disk 0b.10.6 Shelf 10 Drawer 1 Slot 6 Bay 6 [NETAPP X375_XXXXXXX NA01] S/N [XXXXXXX] UID [5000CCA0:C4159D20:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist.Disk 0b.10.8 Shelf 10 Drawer 1 Slot 8 Bay 8 [NETAPP X375_XXXXXXX NA01] S/N [XXXXXXX] UI in SK process config_thread on release 9.13.1P3 (C) on Thu Jun 13 08:41:24 +03 2024- All the drives of root aggregate were owned by the shelf which went down.
[?] Thu Jun 13 08:41:24 +0300 [Node1: config_thread: sk.panic:alert]: Panic String: aggr aggr1_SATA: raid volfsm, fatal multi-disk error.. Raid type - raid_tec Group name plex0/rg0 state NORMAL. 28 disks failed in the group. Disk 0b.10.2 Shelf 10 Drawer 1 Slot 2 Bay 2 [NETAPP X375_XXXXXXX NA01] S/N [XXXXXX] UID [5000CCA0:C415E3F0:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist. Disk 0b.10.4 Shelf 10 Drawer 1 Slot 4 Bay 4 [NETAPP X375_XXXXXXXX NA01] S/N [XXXXXX] UID [5000CCA0:C415BA70:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist. Disk 0b.10.6 Shelf 10 Drawer 1 Slot 6 Bay 6 [NETAPP X375_XXXXXXXX NA01] S/N [XXXXXX] UID [5000CCA0:C4159D20:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist. Disk 0b.10.8 Shelf 10 Drawer 1 Slot 8 Bay 8 [NETAPP X375_XXXXXXXX NA01] S/N [XXXXXXX] UI in SK process config_thread on release 9.13.1P3 (C) [?] Thu Jun 13 08:41:24 +0300 [Node1: config_thread: cf.fm.panicToInProgress:alert]: Failover monitor: Panic during takeover; takeover will be disabled on reboot.