Multi-disk/plex failure in a Fabric MetroCluster
Applies to
- Fabric MetroCluster
- ONTAP 9
Issue
Multiple remote disk failures and eventually plex failures are experienced by both sites in a MetroCluster.
Example:
Mon May 17 13:40:32 UTC [SiteA-02: cfdisk_config: cf.disk.skipped:notice]: The disk FC_switch_B_1:9.126L514 was skipped because it reported the status adapter error prevents command from being sent to device.
Mon May 17 14:00:00 UTC [SiteA-02: config_thread: raid.config.check.failedPlex:error]: Plex /SiteA-aggr-02/plex1 has failed.
Mon May 17 14:03:32 UTC [SiteA-02: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process schm: RaidDegradedMirrorAggrAlert[00000000-0000-0000-0000-000000000000].
FCVI resets will also be observed.
Example:
Wed May 19 13:11:48 UTC [SiteA-02: ispfcvi2500_port2: fcvi.qlgc.rmt.link.down:notice]: FC-VI adapter: Link to partner node over port 0f is down. Partner port id = 0x60400, partner node's system id = 000000000.
Wed May 19 13:11:49 UTC [SiteA-02: ispfcvi2500_main2: pfo.failover.start.error:debug]: params: {'partner': 'DR Primary Partner', 'reason': 'Failover not enabled', 'client': 'WAFL', 'port': '1', 'trigger': 'IPFO_TRIG_BAD_COMPL'}
Wed May 19 13:11:49 UTC [SiteA-02: wafl_exempt03: fcvi.qlgc.ioErr:error]: FC-VI adapter: FCVI driver on port 0f received IO error. Status = Invalid VI state(status code = 0x10c), FCVI opcode = Write Request(0x1), QP name = WAFL, QP index = 3, Remote node's system id = 000000000.
Wed May 19 13:11:49 UTC [SiteA-02: wafl_exempt03: mirror.stream.qp.error:debug]: params: {'error': 'NVMM_ERR_POLL', 'qp_name': 'WAFL', 'mirror': 'DR PARTNER'}
Wed May 19 13:11:49 UTC [SiteA-02: wafl_exempt03: nvmm.mirror.aborting:debug]: mirror of sysid 2, partner_type DR PARTNER and mirror state NVMM_MIRROR_ONLINE is aborted because of reason NVMM_ERR_POLL.
Wed May 19 13:11:49 UTC [SiteA-02: fcvi_cm: ems.engine.suppressed:debug]: Event 'ic.rdma.qpDisconnected' suppressed 5 times in last 171775 seconds.
Wed May 19 13:11:49 UTC [SiteA-02: fcvi_cm: ic.rdma.qpDisconnected:debug]: WAFL is disconnected.
Wed May 19 13:11:49 UTC [SiteA-02: fcvi_cm: ems.engine.suppressed:debug]: Event 'ic.rdma.qpConnected' suppressed 11 times in last 171181 seconds.
Wed May 19 13:11:49 UTC [SiteA-02: fcvi_cm: ic.rdma.qpConnected:debug]: WAFL is connected.
Wed May 19 13:11:49 UTC [SiteA-02: fcvi_cm: rdma.rlib.connected:debug]: WAFL QP is now connected.
Wed May 19 13:11:49 UTC [SiteA-02: nvmm_helper: nvpm.state.changed:debug]: Node 2's NVPM state changed from "2" to "2".
Wed May 19 13:11:49 UTC [SiteA-02: wafl_exempt02: mirror.stream.qp.error:debug]: params: {'error': 'NVMM_ERR_POLL', 'qp_name': 'WAFL', 'mirror': 'HA Partner'}
Wed May 19 13:11:49 UTC [SiteA-02: wafl_exempt02: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_ONLINE is aborted because of reason NVMM_ERR_POLL.
Wed May 19 13:11:49 UTC [SiteA-02: mcc_cfd_rnic: nvmm.mirror.aborting:debug]: mirror of sysid 3, partner_type AUXDR PARTNER and mirror state NVMM_MIRROR_LAYOUT_SYNCED is aborted because of reason NVMM_ERR_STREAM.
Wed May 19 13:11:49 UTC [SiteA-02: mcc_cfd_rnic: mirror.stream.qp.error:debug]: params: {'error': 'NVMM_ERR_STREAM', 'qp_name': 'RAID', 'mirror': 'HA Partner'}
Wed May 19 13:11:49 UTC [SiteA-02: mcc_cfd_rnic: mirror.stream.qp.error:debug]: params: {'error': 'NVMM_ERR_STREAM', 'qp_name': 'MISC', 'mirror': 'HA Partner'}
Wed May 19 13:11:49 UTC [SiteA-02: MCC_DR_Callout: fcvi.qlgc.sent.disconnect:notice]: FC-VI adapter: Disconnect request sent on port 0f. QP name = RAID, QP index = 4, Remote node's system id = 000000000.
Wed May 19 13:11:49 UTC [SiteA-02: MCC_DR_Callout: fcvi.qlgc.sent.disconnect:notice]: FC-VI adapter: Disconnect request sent on port 0f. QP name = MISC, QP index = 10, Remote node's system id = 000000000.