Disk Inventory Mis-Match with AWS HA after adding spare disks
Applies to
Cloud Volumes ONTAP (AWS)
Issue
- Disk failures that were due to back end High-Performance Block (EBS) event. The aggregate then showed (2) '
failed
' disks:
Aggregate: aggr1 (failed, raid0, partial) (advanced zoned checksums) Plex: /aggr1/plex0 (offline, failed, inactive) RAID Group /aggr1/plex0/rg0 (partial, advanced_zoned checksums) Usable Physical Position Disk Pool Type RPM Size size Status -------- ------ ------ ----- ----- ----- ------ ------ data Net-2.3 0 VMDISK - 1007GB 1023GB (normal) data FAILED 0 - - 1007GB 0B (failed) data FAILED 0 - - 1007GB 0B (failed)
- The administrator then added (2) spare disks in the AWS portal while the EBS disks still showed as failed from the CVO side (CLI and System Manager).
- When the original 'failed' disks came back on line, a 'Takeover not possible' status is seen due to inventory mis-match:
awsfiler1::> cf status Takeover Node Partner Possible State Description -------------- -------------- -------- ------------------------------------- awsfiler1-01 awsfiler1-02 true Connected to awsfiler1-02 awsfiler1-02 awsfiler1-01 false Connected to awspfiler1-01, Takeover is not possible: Local node missing partner disks
- Additionally, EMS shows messages about missing disks:
awsfiler1-02 ERROR cf.disk.inventory.mismatch: Status of the disk 0f.15 (34363435:36316534:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000)has recently changed or the node(awsfiler1-02) is missing the disk.