ONTAPClusterCreateTimedOut: ONTAP cluster create timed out
Applies to
- NetApp ONTAP Select (OTS) HA
- NetApp ONTAP Select Deploy
- CISCO
- VMWare
- Remote Branch Office (ROBO)
- NAT configured in-between Deploy and OTS
Issue
- ROBO deploy of OTS HA fails to multiple GEOs with the same error:
evtype='ClusterDeployFailed', category='cluster', level='Error', detail='ONTAPClusterCreateTimedOut: ONTAP cluster create timed out. Not all nodes successfully joined cluster "cluster01"
- In the
/opt/netapp/log/sdotadmin_server.log
we see that we are getting timeouts when running cluster-add-node-status-get-iter API. The cluster is created successfully but the node join op looks to be failing:
2021-09-21 18:09:55,560|ERROR| 1474|34586|logger.py|132|ZAPI.py|89:retry| invoke: failed. Reason: timeout('timed out')
2021-09-21 18:09:55,560|DEBUG| 1474|34586|logger.py|132|ZAPI.py|241:invoke|
API: "cluster-add-node-status-get-iter"
User: "admin"
Request:
<suppressed/>
Response:
<suppressed/>
2021-09-21 18:09:55,561|ERROR| 1474|34586|logger.py|132|ClusterZapis.py|286:z_get_cluster_add_node_status| Cluster [cluster01-01]: cluster-add-node-status zapi exception: [timeout('timed out')]
- Pinging the Deploy IP from the OTS cluster-mgmt LIF works but not vice versa
- When moving the cluster-mgmt LIF to another port (e0b) its IP can be pinged, but when moving it back to the default port (e0a) Ping will fail again.
- Both Node mgmt LIF`s can be Pinged from Deploy CLI.
- When shutting down the first nodes (where the cluster-mgmt LIF is home) node-mgmt LIF, the cluster-mgmt LIF is pingable.
- When bringing the node-mgmt LIF online again, nothing changes in terms of the cluster-mgmt LIFs reachability, but the node-mgmt LIF is now no longer pingable.
- When shutting down the cluster-mgmt LIF, node-mgmt LIF is reachable again.
- When bringing the cluster-mgmt LIF back online again, it cannot be pinged or accessed, as before.