Consistent unsynchronized log errors on one node on MetroCluster IP
Applies to
- ONTAP 9
- MetroCluster IP
- NVIDIA SN2100 Switch
Issue
- Consistent unsynchronized log errors seen on one cluster in the MetroCluster IP configuration:
Thu Dec 11 17:52:44 -0300 [Cluster_1_Node_1: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of Cluster_1_Node_1 by Cluster_1_Node_2 disabled (unsynchronized log).Thu Dec 11 17:52:48 -0300 [Cluster_1_Node_1: cf_main: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of Cluster_1_Node_1 by Cluster_1_Node_2 enabledThu Dec 11 17:56:47 -0300 [Cluster_1_Node_1: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of Cluster_1_Node_1 by Cluster_1_Node_2 disabled (unsynchronized log).Thu Dec 11 17:56:55 -0300 [Cluster_1_Node_1: cf_main: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of Cluster_1_Node_1 by Cluster_1_Node_2 enabled- Some messages may be seen on the DR cluster, but not consistently reporting and calls out both nodes:
Mon Jan 05 10:35:26 -0300 [Cluster_2_Node_2: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of Cluster_2_Node_1 disabled (unsynchronized log).Mon Jan 05 14:02:09 -0300 [Cluster_2_Node_2: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of Cluster_2_Node_2 by Cluster_2_Node_1 disabled (unsynchronized log).- Periodic link up and down seen on one port of the node reporting the unsynchonized log:
Thu Dec 18 13:09:09 -0300 [Cluster_1_Node_1: kernel: netif.linkDown:info]: Ethernet e0d: Link down, check cable.Thu Dec 18 13:09:09 -0300 [Cluster_1_Node_1: intr: netif.linkDown:info]: Ethernet e0d-20: Link down, check cable.Thu Dec 18 13:09:09 -0300 [Cluster_1_Node_1: vifmgr: vifmgr.portdown:notice]: A link down event was received on node Cluster_1_Node_1, port e0d.Thu Dec 18 13:09:13 -0300 [Cluster_1_Node_1: kernel: netif.linkUp:info]: Ethernet e0d: Link up.- One switch is receiving CRC errors from the node port that is having the link up and link down issues:
Kernel Interface tableIface MTU RX_OK RX_ERR RX_DRP RX_OVR TX_OK TX_ERR TX_DRP TX_OVR Flg----------- ----- --------- -------- -------- -------- --------- -------- -------- -------- -----<snip>swp7s0 9216 280331621 2633089 0 0 269876557 0 0 0 BMRU- Both nodes Cluster_2_Node_1 and Cluster_2_Node_2 receive CRCs across the network as seen in the
system node run -node * -command ifstat output:- Node Cluster_2_Node_1:
-- interface e0d (22 hours, 4 minutes, 22 seconds) --
RECEIVE Total frames: 428m | Frames/second: 5395 | Total bytes: 1331g Bytes/second: 16752k | Total errors: 2164k | Errors/minute: 1634 Total discards: 39 | Discards/minute: 0 | Multi/broadcast: 127k Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0 CRC errors: 1082k | Runt frames: 0 | Fragment: 0
-
- Node Cluster_2_Node_2:
-- interface e0d (22 hours, 31 minutes, 43 seconds) --
RECEIVE Total frames: 310m | Frames/second: 3824 | Total bytes: 640g Bytes/second: 7893k | Total errors: 996k | Errors/minute: 737 Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 129k Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0 CRC errors: 498k | Runt frames: 0 | Fragment: 0
