Rebooting SP due to loss of ACP comms
Applies to
- ONTAP 9
- Service Processor (SP)
Issue
- SP reboot after ACP alert issue cleared. EMS log example:
[node_name-01: dsa_worker1: ses.status.ACPError:alert]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor error for SAS shelf ACP processor 2: critical status ; Alternate Control Path hardware failed This module is on the rear of the shelf at the top right, on shelf module B.
[node_name-01: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.
[node_name-01: monitor: monitor.globalStatus.critical:EMERGENCY]: Disk shelf fault.
[node_name-01: dsa_worker2: ses.status.ACPError:alert]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor error for SAS shelf ACP processor 1: critical status ; Alternate Control Path hardware failed This module is on the rear of the shelf at the top left, on shelf module A.
[node_name-01: dsa_worker2: ses.status.ACPInfo:info]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor information for SAS shelf ACP processor 2: normal status.
[node_name-01: splog_main: splog.running.normally:info]: Process splogd is operating normally.
[node_name-01: dsa_worker1: ses.status.ACPInfo:info]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor information for SAS shelf ACP processor 1: normal status.
[node_name-01: statd: monitor.shelf.fault.ok:notice]: Fault previously reported on disk storage shelf attached to channel 0a has been corrected.
[node_name-01: monitor: monitor.globalStatus.ok:notice]: The system's global status is normal.
[node_name-02: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.
[node_name-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Disk shelf fault.
[node_name-02: dsa_worker1: ses.status.ACPError:alert]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor error for SAS shelf ACP processor 1: critical status ; Alternate Control Path hardware failed This module is on the rear of the shelf at the top left, on shelf module A.
[node_name-02: splog_main: splog.running.normally:info]: Process splogd is operating normally.
[node_name-02: dsa_worker3: ses.status.ACPInfo:info]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor information for SAS shelf ACP processor 2: normal status.
[node_name-02: dsa_worker2: ses.status.ACPInfo:info]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor information for SAS shelf ACP processor 1: normal status.
[node_name-02: statd: monitor.shelf.fault.ok:notice]: Fault previously reported on disk storage shelf attached to channel 0a has been corrected.
[node_name-02: monitor: monitor.globalStatus.ok:notice]: The system's global status is normal.
- Automatically SP reboot with event message example:
Record 833: Tue Oct 13 18:20:19 2020 [SP.critical]: Rebooting SP due to loss of ACP comms
- ACP status is ok and working fine.
- High number of transmitted frames and bytes/second, through management e0M port:
-- interface e0M (30 days, 20 hours, 46 minutes, 42 seconds) --
RECEIVE
…
TRANSMIT
>>> Total frames: 2992m | Frames/second: 1122 | Total bytes: 4523g
Bytes/second: 1696k | Total errors: 0 | Errors/minute: 0
Total discards: 0 | Queue overflow: 0 | Multi/broadcast: 90594
…
-- interface e0M (30 days, 20 hours, 44 minutes, 31 seconds) --
RECEIVE
…
TRANSMIT
>>> Total frames: 216m | Frames/second: 81 | Total bytes: 322g
Bytes/second: 120k | Total errors: 0 | Errors/minute: 0
Total discards: 0 | Queue overflow: 0 | Multi/broadcast: 90526
…
- Node management LIF and intercluster LIF sharing same Broadcast Domain.
