Field Notice: FN70234 - Drive Failure Might Cause the HyperFlex Cluster to Go Down - Software Upgrade Recommended

Available Languages

Updated:February 23, 2024

Document ID:FN70234

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Notice

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Products Affected

Affected Product Name	Description	Comments
HX-HD12TB10K12G	1.2 TB 12G SAS 10K RPM SFF HDD
HX-HD12TB10K12G=	1.2 TB 12G SAS 10K RPM SFF HDD
HX-M2-240GB	240GB SATA M.2
HX-M2-240GB=	240GB SATA M.2
HX-SD240GBE1NK9	^240GB Enterprise Value SSD (SATA) (1X FWPD, SED)
HX-SD240GBE1NK9=	^240GB Enterprise Value SSD (SATA) (1X FWPD, SED)
HX-SD240GBM1K9	^240GB Enterprise Value SSD (SATA) (1X FWPD, SED)
HX-SD240GBM1K9=	^240GB Enterprise Value SSD (SATA) (1X FWPD, SED)
HX-SD38TBE1NK9	^3.8TB Enterprise Value SSD (SATA) (1X FWPD, SED)
HX-SD38TBE1NK9=	^3.8TB Enterprise Value SSD (SATA) (1X FWPD, SED)
HX-SD38TBM1K9	^3.8TB Enterprise Value SSD (SATA) (1X FWPD, SED)
HX-SD38TBM1K9=	^3.8TB Enterprise Value SSD (SATA) (1X FWPD, SED)
HX-SD960GBE1NK9	^960GB Enterprise value SATA SSD (1X FWPD, SED)
HX-SD960GBE1NK9=	^960GB Enterprise value SATA SSD (1X FWPD, SED)
HX-SD960GBM1K9	960GB Enterprise Value SSD (SATA) (1X FWPD, SED)
HX-SD960GBM1K9=	960GB Enterprise Value SSD (SATA) (1X FWPD, SED)
UCS-HD12TB10K12G	1.2 TB 12G SAS 10K RPM SFF HDD
UCS-HD12TB10K12G=	1.2 TB 12G SAS 10K RPM SFF HDD
UCS-M2-240GB	240GB M.2 SATA Micron G1 SSD	Part Alternate
UCS-M2-240GB=	240GB M.2 SATA Micron G1 SSD	Part Alternate

Defect Information

Defect ID	Headline
CSCwh94241	HX cluster down after HX-HD12TB10K12G drives of different sector size installed
CSCvm66552	Multiple simultaneous 3.8TB SED SSD drive failures may cause the HX cluster to go offline
CSCvk17250	Cluster instability when disks of different sector size placed in HX node
CSCvj66157	SED drive failure may cause the UCS/HX cluster to go down

Problem Description

A drive firmware issue on select Self-Encrypting Drives (SEDs) might cause an operational issue for some HyperFlex clusters.

An increased rate of blocked drives might occur, which requires frequent drive replacements and in some instances a potential for a HyperFlex cluster outage.

A second drive issue can occur during drive replacement or drive addition (both SED or non-SED). If the sector size of the existing drive differs from that of the replacement drive, there is a potential for a HyperFlex cluster outage.

All existing clusters running HXDP version below 3.5.2b (including older HX 2.x and 3.0) should be upgraded to the latest 3.5 release available on cisco.com before any disks are added/replaced or cluster expansion.

Background

An operational bug in the drive firmware might be triggered when the drive is subjected to a specific workload, which could result in uncorrectable drive-level errors. Software upgrades are recommended in order to mitigate potential risks associated with uncorrectable errors. A couple of newer issues have also been addressed where data that is read in one location can affect data stored in an adjacent location and during drive replacement or addition there is a potential block size mismatch of drive sector size which leads to the cluster outage.

HyperFlex blocks the drive when the involved errors are encountered. The blocked drive state is when a disk is not utilized by the cluster due to either a software error or an I/O error. This could be a transitional state while the cluster attempts to repair the disk, if the disk is still available, before the state transitions to "repairing". After repeated I/O errors the drive might be permanently blocked, which could trigger frequent drive replacements. While the HyperFlex HX Data Platform (HXDP) software protects against drive failures, there is a potential for the cluster to fail after multiple, simultaneous drive failures.

Problem Symptom

In order to handle the errors, HXDP software puts the drive in a blocked state. When there are several drive errors, the drive is permanently blocked as shown in the How To Identify Affected Products section. Blocked drives appear as shown in the How To Identify Affected Products section.

Note: All clusters that have the affected parts need to be upgraded as soon as possible and the upgrade recommendation is not limited to clusters that show this symptom.

Workaround/Solution

HyperFlex blocks the drive when the involved errors are encountered. The "blocked" drive state is when a disk is not utilized by the cluster due to either a software error or an I/O error. This could be a transitional state while the cluster attempts to repair the disk, if the disk is still available, before the state transitions to "repairing". After repeated I/O errors the drive might be permanently blocked, which could trigger frequent drive replacements. While the HXDP software protects against drive failures, there is a potential for the cluster to fail after multiple, simultaneous drive failures.

The action required for HyperFlex nodes is listed in this table.

Configuration	Action Required
Systems with only HX-M2-240GB boot drive (no SED in system)	Perform a combined upgrade - HXDP to Version 3.5(2b) or later, and Unified Computing System Manager (UCS Manager) to Version 4.0(1c) or later. Note*: Do not upgrade UCS Manager only.
Clusters not created	Create the cluster with UCS Manager Version 4.0(1c) or later, and HXDP Version 3.5(2b) or later.
Clusters created (with SEDs or disks that may have different sector sizes)	Upgrade HXDP to Version 3.5(2b) or later. Upgrade UCS Manager to Version 4.0(1c) or later.

* This is the minimum upgrade version. The recommended version to be used is listed in Recommended Cisco HyperFlex HX Data Platform Software Releases - for Cisco HyperFlex HX-Series Systems.

See Cisco HyperFlex Systems Upgrade Guides for instructions on how to upgrade your system.

UCS Manager software images are available at UCS Infrastructure and UCS Manager Software Release 4.0(1C).

HyperFlex software images are available at HyperFlex HX Data Platform Release 3.5(2i).

Cisco recommends that you enable autosupport in order to enhance the supportability of HyperFlex clusters.

All HyperFlex releases posted are supported, however the recommended release is designated with a "*" next to the release name on the Software Download page.

How to Identify Affected Products

HyperFlex Systems

The Products Affected section lists the systems with the affected drive Product IDs.

Note: The upgrade and the remediation process post upgrade is required irrespective of whether or not the system currently shows blocked drives.

Blocked drives can be seen in the drive inventory on the HyperFlex Connect user interface as follows:

Click System Information in the left column.
Click the Disks tab.
Look to see if any disks show "Blacklisted" in the Status column.

Blocked drives also appear in the System Overview tab. Click on any slot with a red circle as shown in this example.

Revision History

Version	Description	Section	Date
2.0	Add additional PIDs and generalize to impacting all drives with sector size differences	Products Affected, Description, Workaround/Solution	2024-FEB-23
1.9	Updated Terminology	—	2020-JUL-20
1.8	Updated the Defect Information, Problem Description, Background, Problem Symptom, and Workaround/Solution Sections	—	2020-APR-02
1.7	Updated the Workaround/Solution Section	—	2019-MAR-25
1.6	Updated the Workaround/Solution and How to Identify Affected Products Sections	—	2019-FEB-11
1.5	Updated the Workaround/Solution Section	—	2019-JAN-04
1.4	Updated the Background, Workaround/Solution, and How to Identify Affected Products Sections	—	2018-DEC-04
1.3	Updated the Defect Information, Background, and Workaround/Solution Sections for HX Release 3.0(1i)	—	2018-NOV-28
1.2	Updated the Product Hierarchy Metatags	—	2018-OCT-18
1.1	Updated for HyperFlex Content	—	2018-AUG-16
1.0	Initial Release	—	2018-JUN-28

For More Information

For further assistance or for more information about this field notice, contact the Cisco Technical Assistance Center (TAC) using one of the following methods:

Receive Email Notification About New Field Notices

To receive email updates about Field Notices (reliability and safety issues), Security Advisories (network security issues), and end-of-life announcements for specific Cisco products, set up a profile in My Notifications.

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)