A storage cluster healing timeout is the length of time HX Connect or HX Data Platform Plug-in waits before automatically healing the storage cluster. If a disk fails, the healing timeout is 1 minute. If a node fails,
the healing timeout is 2 hours. A node failure timeout takes priority if a disk and a node fail at same time or if a disk
fails after node failure, but before the healing is finished.
When the cluster
resiliency status is Warning, the HX Data Platform system supports the
following storage cluster failures and responses.
Optionally, click the associated Cluster Status/Operational Status or Resiliency Status/Resiliency Health in HX Connect and HX Data Platform Plug-in, to display reason messages that explain what is contributing to the current state.
Cluster Size
|
Number of Simultaneous Failures
|
Entity Failed
|
Maintenance Action to Take
|
3 nodes
|
1
|
One node.
|
The storage cluster does not automatically heal.
Replace the failed node to restore storage cluster health.
|
3 nodes
|
2
|
Two or more disks on two nodes are blacklisted or failed.
|
-
If one SSD fails, the storage cluster does not automatically heal.
- Replace the faulty SSD and restore the system by rebalancing the cluster
-
If one HDD fails or is removed, the disk is blacklisted immediately. The storage cluster automatically begins healing within
a minute.
-
If more than one HDD fails, the system might not automatically restore storage cluster health.
- If the system is not restored, replace the faulty disks and restore the system by rebalancing the cluster
|
4 nodes
|
1
|
One node.
|
If the node does not recover in two hours, the storage cluster starts healing by rebalancing data on the remaining nodes.
To recover the failed node immediately and fully restore the storage cluster:
-
Check that the node is powered on and restart it if possible. You might need to replace the node.
-
Rebalance the cluster
|
4 nodes
|
2
|
Two or more disks on two nodes.
|
If two SSDs fail, the storage cluster does not automatically heal.
If the disk does not recover in one minute, the storage cluster starts healing by rebalancing data on the remaining nodes.
|
5+ nodes
|
2
|
Up to two nodes.
|
If the node does not recover in two hours, the storage cluster starts healing by rebalancing data on the remaining nodes.
To recover the failed node immediately and fully restore the storage cluster:
-
Check that the node is powered on and restart it if possible. You might need to replace the node.
-
Rebalance the cluster
If the storage cluster shuts down, see Troubleshooting, Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown
section.
|
5+ nodes
|
2
|
Two nodes with two or more disk failures on each node.
|
The system automatically triggers a rebalance after a minute to restore storage cluster health.
|
5+ nodes
|
2
|
One node and One or more disks on a different node.
|
If the disk does not recover in one minute, the storage cluster starts healing by rebalancing data on the remaining nodes.
If the node does not recover in two hours, the storage cluster starts healing by rebalancing data on the remaining nodes.
If a node in the storage cluster fails and a disk on a different node also fails, the storage cluster starts healing the
failed disk (without touching the data on the failed node) in one minute. If the failed node does not come back up after two
hours, the storage cluster starts healing the failed node as well.
To recover the failed node immediately and fully restore the storage cluster:
-
Check that the node is powered on and restart it if possible. You might need to replace the node.
-
Rebalance the cluster
|