THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Affected Software Product | Affected Release | Affected Release Number | Comments |
---|---|---|---|
Unified Computing System (UCS) Infrastructure Software Bundle | 4 | 4.3(5a) |
Defect ID | Headline |
CSCwn44955 | 4GFI: svc_sam_dme failed on primary FI after one of the FI is replaced or erase config |
The svc_sam_dme Data Management Engine (DME) process will fail after one of the following actions is taken:
This failure will cause the UCS cluster to become unmanageable until the DME process is recovered using the steps described in the Workaround/Solution section of this field notice.
This issue will only occur when the Cisco UCS Manager (UCSM) port auto-discovery policy is enabled and Fibre-Channel (FC) ports are configured on the FI. If this policy is not enabled or if FC ports are not configured, then FI replacement or erase-config will succeed.
The DME process is one of several management processes that run on the FI and is responsible for Cisco UCSM access and configuration. When the DME process has failed, no management tasks can be completed. Management tasks include server discovery, network configuration, and server profile deployment.
The port auto-configure policy is a policy within Cisco UCSM that automatically determines the type of server connected to a switch port and configures the switch port accordingly. This policy is not enabled by default.
The DME process will fail, and the UCS cluster will become unmanageable. The UCS GUI will not be reachable, and most commands entered on the CLI will fail.
While in this failed state, network and storage traffic can still pass through the peer FI. However, the replaced FI will not pass traffic because there is no configuration on network and storage ports.
FI-A(local-mgmt)# show cluster extended-state
Cluster Id: 0xXXXXXXXX
Start time: Mon Oct 28 13:02:42 2024
Last election time: Thu Dec 5 20:34:33 2024
A: UP, INAPPLICABLE, (Management services: DOWN)
B: UP, INAPPLICABLE, (Management services: DOWN)
A: memb state UP, lead state INAPPLICABLE, mgmt services state: DOWN
B: memb state UP, lead state INAPPLICABLE, mgmt services state: DOWN
heartbeat state PRIMARY_OK
INTERNAL NETWORK INTERFACES:
eth3, UP
eth4, UP
HA NOT READY
Management services are unresponsive on local Fabric Interconnect
FI-A(local-mgmt)# show pmon state SERVICE NAME STATE RETRY(MAX) EXITCODE SIGNAL CORE
------------ ----- ---------- -------- ------ ----
svc_sam_controller running 0(4) 0 0 no
svc_sam_dme failed 6(4) 0 11 yes <--- Note the svc_sam_dme process is in a "failed" state
svc_sam_dcosAG running 0(4) 0 0 no
svc_sam_bladeAG running 0(4) 0 0 no
svc_sam_portAG running 0(4) 0 0 no
svc_sam_statsAG running 0(4) 0 0 no
svc_sam_hostagentAG running 0(4) 0 0 no
svc_sam_nicAG running 0(4) 0 0 no
svc_sam_licenseAG running 0(4) 0 0 no
svc_sam_extvmmAG running 0(4) 0 0 no
httpd.sh running 0(4) 0 0 no
httpd_cimc.sh running 0(4) 0 0 no
svc_sam_sessionmgrAG running 0(4) 0 0 no
svc_sam_pamProxy running 0(4) 0 0 no
dhcpd running 0(4) 0 0 no
sam_core_mon running 0(4) 0 0 no
svc_sam_netSnmpAG running 0(4) 0 0 no
svc_sam_rsdAG running 0(4) 0 0 no
svc_sam_svcmonAG running 0(4) 0 0 no
ucsm_tftpdv6 running 0(4) 0 0 no
svc_sam_samcproxy running 0(4) 0 0 no
svc_sam_samcstatsproxy running 0(4) 0 0 no
mtuTune running 0(10) 0 0 no
This issue is applicable to all FI models running in UCS Manage Mode (UMM). FIs running in Intersight Managed Mode (IMM) are not impacted.
The most consistent method for recovering a cluster that is in this failed state is to remove the unconfigured FI from the cluster, restart the management processes, and then disable the port auto-discovery policy. After disabling the port auto-discovery policy, the unconfigured FI can be added back to the cluster and cluster creation will complete successfully.
To remove an unconfigured FI from a cluster, power off the unconfigured FI and leave it powered off. Be careful to correctly identify the recently replaced or unconfigured FI, to ensure that the FI that is actively passing traffic is not powered off.
To perform a management process restart, log in to the CLI and enter the following commands:
FI-A# connect local
FI-A(local-mgmt)# pmon stop
FI-A(local-mgmt)#pmon start
To disable the port auto-discovery policy, scope to the port discovery policy setting and disable the feature:
FI-A# scope org
FI-A /org/ # scope port-disc-policy
FI-A /org/port-disc-policy # set server-auto-disc disabled
FI-A /org/port-disc-policy* # commit-buffer
After this feature is disabled, the unconfigured FI can be added to the cluster, and the cluster creation will complete. At this point, the port auto-discovery policy can be enabled again, if desired.
All UCS Managed FIs running 4.3(5a) are impacted.
Version | Description | Section | Date |
1.0 | Initial Release | — | 2025-MAR-26 |
For further assistance or for more information about this field notice, contact the Cisco Technical Assistance Center (TAC) using one of the following methods:
To receive email updates about Field Notices (reliability and safety issues), Security Advisories (network security issues), and end-of-life announcements for specific Cisco products, set up a profile in My Notifications.
Unleash the Power of TAC's Virtual Assistance