NVIDIA cluster switches monitor is false due to invalid SNMP settings
Applies to
- ONTAP 9
- Cluster Network Switches
- Cluster Switch Health Monitoring(CSHM)
- MSN2100-CB2FC
Issue
- Cluster switch reports the monitoring to be false with reason being Invalid SNMP settings
Cluster::*> system switch ethernet show
Switch Type Address Model
--------------------------- ------------------ ---------------- ---------------
Cluster-sw1 (94:xx:xx:xx:xx:98) cluster-network 10.XXX.XX.109 MSN2100-CB2FC
Serial Number: MT2302T000TP
Is Monitored: false
Reason: Invalid SNMP Settings
Software Version: Cumulus Linux version 5.4.0 running on Mellanox
Technologies Ltd. MSN2100
Version Source: LLDP
- Reachability to switches are successful from Cluster (Ping and TraceRoute)
- SwitchSNMPCommunication_Alert can be seen in system health alert show
Cluster::*> system health alert show
Node: Node-08
Alert ID: SwitchSNMPCommunication_Alert
Resource: Cluster-sw1 (94:xx:xx:xx:xx:98)
Severity: Major
Indication Time: Thu Jul 17 04:02:08 2025
Suppress: false
Acknowledge: false
Probable Cause: SNMP communication from the node to the ethernet
switch has failed repeatedly. Invalid SNMP settings
are configured with ONTAP Switch Health Monitoring or
on the Ethernet switch.
Possible Effect: Ethernet switch communication problems and
accessibility issues.
Corrective Actions: 1) Check the SNMPv2c community or SNMPv3 username on the Ethernet switch to verify
that the expected community string or username is configured.
To view the expected community string or username, run the "system switch ethernet show -snmp- config" command.
2) (SNMPv3) Verify that the SNMPv3 credentials are present within ONTAP.
To view the established SNMP logins, run the "security login show -application snmp" command.
If a custom engine-id was provided for the SNMPv3 user,ensure it is same as that of the remote switch.
- Switch-Health Subsystem reports to be degraded
Cluster::*> system health subsystem show
Subsystem Health
----------------- ------------------
SAS-connect ok
Environment ok
Memory ok
Service-Processor ok
Switch-Health degraded
CIFS-NDO ok
Motherboard ok
IO ok
MetroCluster ok
MetroCluster_Node ok
FHM-Switch ok
FHM-Bridge ok
SAS-connect_Cluster ok
13 entries were displayed.
- Attempt to delete the switches from monitoring and adding back fails with the below error
::> system cluster-switch create -device Cluster-sw1 -address 10.XXX.XX.109 -snmp-version SNMPv2c -community cshm1! -model OTHER -type cluster-network -is-monitoring-enabled-admin trueError: SNMP validation request timed out. Verify that the "-community-or-username" value is valid.