StorageGRID node decommission gets stuck by "Selecting destination for EC group failed" due to old EC profile
Applies to
- NetApp StorageGRID 11.8 and earlier
- Erasure coding (EC)
- EC profile has been changed and the old EC profile is not deactivated when the decommission starts
Issue
- Node decommission is paused at erasure coded data
bycast.logindicatesSelecting destination for EC group failed after 5 retries.
Example1:
Dec 9 19:29:01 <NODE> ADE: |21426716 1820442787 ECJM CSRT 2022-12-09T19:29:01.253077| NOTICE 0376 ECJM: EcgDecomJob: '11696086893380218698' ECG: 'DB1B050F-1755-4F86-995C-81085336DC19' VCS: 'DB349EB5-32DE-40C6-BB52-DA99AEF0A607': Selecting possible destination for affectedBytes: 0
Dec 9 19:29:01 <NODE> ADE: |21426716 1820442787 ECJM EPRP 2022-12-09T19:29:01.253925| ERROR 1054 PROC: Exception: /build/src/modules/ErasureCoding/EC_JobManager_Module/EcgDecommissionJob.cc(368): Throw in function void erasurecoding::EcgDecommissionJob::selectDestinationNode()#012Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >#012std::exception::what: ENFORCE failed: !"Selecting destination for EC group failed after 5 retries."#012
Dec 9 19:29:06 <NODE> ADE: |21426716 1820442641 ECJM CSRT 2022-12-09T19:29:06.397947| ERROR 0112 ECJM: Exception caught during decommissioning ENFORCE failed: 'SUCS' == *jobResult.
Dec 9 19:29:06 <NODE> ADE: |21426716 1820442641 ECJM CSRT 2022-12-09T19:29:06.398057| ERROR 1054 PROC: Exception: /build/src/modules/ErasureCoding/EC_JobManager_Module/NodeDecommissionJob.cc(447): Throw in function CXD_AtomContainer erasurecoding::NodeDecommissionJob::waitForJobCompletions()#012Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >#012std::exception::what: ENFORCE failed: 'SUCS' == *jobResult#012
Example2:
Oct 22 09:29:15 <NODE> ADE: |21000409 0463547062 ECPL SQRT 2025-10-22T09:29:15.726582| ERROR 0100 ECPL: Placement request failed due to: ENFORCE failed: fragments <= totalNodesOct 22 09:29:15 <NODE> ADE: |21000409 0466773853 ECJM EPRP 2025-10-22T09:29:15.726608| WARNING 0373 59508ef0d7310f8e ECJM: EcgDecomJob: '16645277957062581647' ECG: '0195F59E-BC20-4F28-B0D1-03FB1273ED75' VCS: '8314AD81-5171-4E9D-A251-6DED6C733617': Unable to select destination nodes: 'FAIL'. RetryingOct 22 09:29:15 <NODE> ADE: |21000409 0466773853 ECJM EPRP 2025-10-22T09:29:15.726654| WARNING 0062 59508ef0d7310f8e ECJM: Caught exception 'ENFORCE failed: !"Selecting destination for EC group failed after 5 retries."' when running job 16645277957062581647: EC group Decommission - EC group 0195F59E-BC20-4F28-B0D1-03FB1273ED75, VCS: 8314AD81-5171-4E9D-A251-6DED6C733617.Oct 22 09:29:15 <NODE> ADE: |21000409 0466773853 ECJM EPRP 2025-10-22T09:29:15.726694| ERROR 1081 59508ef0d7310f8e PROC: Exception: /build/src/modules/ErasureCoding/EC_JobManager_Module/EcgDecommissionJob.cc(348): Throw in function selectDestinationNode#012Dynamic exception type: boost::wrapexcept<std::runtime_error>#012std::exception::what: ENFORCE failed: !"Selecting destination for EC group failed after 5 retries."#012
