Introduction:
This article outlines the steps you can use to troubleshooting Hyperflex Datastore mount issues.
Overview:
By default, Hyperflex datastores are mounted using NFS v3.
NFS (Network File System) is a file-sharing protocol used by the hypervisor to communicate with a NAS (Network Attached Storage) server over a standard TCP/IP network.
Here is a description of NFS components used in an vSphere environment:
- NFS server– a storage device or a server that uses the NFS protocol to make files available over the network. In Hyperflex world, each controller VM runs an NFS server instance. The NFS server IP for the datastores is the eth1:0 interface IP.
- NFS datastore – a shared partition on the NFS server that can be used to hold virtual machine files.
- NFS client – ESXi includes a built-in NFS client used to access an NFS devices.
In addition to the above regular NFS components, there is a VIB installed on the esxi called the IOVisor. This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disk drives that are attached to individual virtual machines. From the hypervisor’s perspective, it is simply attached to a network file system.
Problem:
The symptoms of mount issues may show up in ESXi host as below.
Problem Description 1: Datastores showing Inaccessible in vCenter:

Note: When your Datastores show up as inaccessible in vCenter, they will be seen as mounted unavailable in the ESX CLI. This means the datastores were perviously mounted and working on this host.
Check the Datastores via CLI, SSH to the ESXI host and execute the below command:
[root@node1:~] esxcfg-nas -l
test1 is 10.197.252.106:test1 from 3203172317343203629-5043383143428344954 mounted unavailable
test2 is 10.197.252.106:test2 from 3203172317343203629-5043383143428344954 mounted unavailable
Problem Description 2: Datastores not showing up at all in vCenter/CLI:

Note: When your Datastores are not showing in vCenter or CLI. This indicates that the Datastore was never successfully mounted on the host previously.
Check the Datastores via CLI, SSH to the ESXI host and execute the below command:
[root@node1:~] esxcfg-nas -l
[root@node1:~]
Solution:
The reasons for the mount issue can different, below are a list of checks to validate & correct if any.
Network Reachability Check:
First thing to check in case of any datastore issues is whether the host is able to reach the NFS Server IP.
The NFS server IP in case of Hyperflex is the IP assgined to the virtual interface eth1:0, which will be present on one of the SCVMs.
If the Esxi hosts are unable to ping the NFS server IP it will cause the datastores to become inaccessible.
Find the eth1:0 IP by running the below command on all SCVMs.
Note: The Eth1:0 is a virtual/floating interface and will be present on only one of the SCVMs.
root@SpringpathControllerGDAKPUCJLE:~# ifconfig eth1:0
eth1:0 Link encap:Ethernet HWaddr 00:50:56:8b:62:d5
inet addr:10.197.252.106 Bcast:10.197.252.127 Mask:255.255.255.224
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Then go to the ESXI host having issues with datastore mounting and check if it is able to reach the NFS server IP.
[root@node1:~] ping 10.197.252.106
PING 10.197.252.106 (10.197.252.106): 56 data bytes
64 bytes from 10.197.252.106: icmp_seq=0 ttl=64 time=0.312 ms
64 bytes from 10.197.252.106: icmp_seq=1 ttl=64 time=0.166 m
If you are able to ping, proceed with the troubleshooting steps in the next section.
If you are not able to ping, you will have to check your environment to fix the reachability. Below are a few pointers that can be looked upon:
- hx-storage-data vSwitch Settings:
Note: By-default all the below config is done by the installer during the cluster deployment. If it has been changed manually after that, please verify the settings as per the below steps.
- MTU Settings - If you have enabled jumbo MTU during cluster deployment, the MTU on the vSwitch should also be 9000. In case you are not using jumbo MTU this should be 1500.

- Teaming and Failover - By default we try to ensure that the storage data traffic is switched locall by the FI. Hence the active & standby adapters across all hosts should be same.

- Port Group Vlan settings - The storage-data VLAN should be specified on both "Storage Controller Data Network" & "Storage Hypervisor Data Network" port groups.


- No overrides on Port Group level- The "Teaming & Failover" settings done on the vSwitch level get applied to the port-groups by-default, hence it is recommended to not override the settings on port-group level.

- UCS vNIC Settings:
Note: By-default all the below config is done by the installer during the cluster deployment. If it has been changed manually after that, please verify the settings as per the below steps.
- MTU Settings- Make sure the MTU size and QOS policy is configured correctly in the storage-data vnic template. The storage-data vnics use Platinum QoS policy and the MTU should be configured as per your environment.

- VLAN Settings - The hx-storage-data VLAN created during the cluster deployment should be allowed in the vnic template. Make sure it is not marked as native

IOvisor/ SCVMclient/ NFS Proxy Status check:
The SCVMclient vib in the ESXI acts as the NFS Proxy. It intercepts the Virtual Machine IO, sends it to the respective SCVM and serves them back with the needed info.
We should first ensure that the VIB is installed on our hosts, for this ssh to one of the ESXI and execute the below commands:
[root@node1:~] esxcli software vib list | grep -i spring
scvmclient 3.5.2b-31674 Springpath VMwareAccepted 2019-04-17 <<<<<<<<<<<
stHypervisorSvc 3.5.2b-31674 Springpath VMwareAccepted 2019-05-20
vmware-esx-STFSNasPlugin 1.0.1-21 Springpath VMwareAccepted 2018-11-23
Check the status of the scvmclient on the esxi now and make sure its running, if it is stopped please start it by using the command "/etc/init.d/scvmclient start"
[root@node1:~] /etc/init.d/scvmclient status
+ LOGFILE=/var/run/springpath/scvmclient_status
+ mkdir -p /var/run/springpath
+ trap mv /var/run/springpath/scvmclient_status /var/run/springpath/scvmclient_status.old && cat /var/run/springpath/scvmclient_status.old |logger -s EXIT
+ exec
+ exec
Scvmclient is running <<<<<<<<<<<<
Cluster UUID resolvable to the ESXI loopback IP
Hyperflex maps the UUID of the cluster to the loopback interface of the ESXI, so that the ESXI passes the NFS requests to its own scvmclient. If this is missing, you may face issues with mounting the datastores on the host. In order to verify this, ssh to the a host which has datastores mounted and ssh to the host with issues, and cat the below file.
If you see the non-working host does not have the entry in /etc/hosts, you can copy it from a working host into the /etc/hosts of the non-working host.
Non-Working Host:
[root@node1:~] cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
10.197.252.75 node1
Working Host:
[root@node2:~] cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
10.197.252.76 node2
127.0.0.1 3203172317343203629-5043383143428344954.springpath 3203172317343203629-5043383143428344954 <<<<<<<
Stale Datastore entries in /etc/vmware/esx.conf
If the HX cluster has been recreated without re-installing ESXI, you might have old datastore entries in the esx.conf file.
This will not allow you to mount the new datastores with the same name. You can check all the HX datastores in esx.conf from the below file.
[root@node1:~] cat /etc/vmware/esx.conf | grep -I nas
/nas/RepSec/share = "10.197.252.106:RepSec"
/nas/RepSec/enabled = "true"
/nas/RepSec/host = "5983172317343203629-5043383143428344954"
/nas/RepSec/readOnly = "false"
/nas/DS/share = "10.197.252.106:DS"
/nas/DS/enabled = "true"
/nas/DS/host = "3203172317343203629-5043383143428344954"
/nas/DS/readOnly = "false"
In the above output you will see that the old datastore will be mapped using the old cluster UUID, hence ESXI wont allow you to mount the same named datastore with new UUID.
In order to resolve this you will have to remove the old datastore entry using the command - "esxcfg-nas -d RepSec"
Once removed, retry mounting the datastore from the HX-Connect and it should work.
Check firewall rules on the ESXI
Check 1:
[root@node1:~] esxcli network firewall get
Default Action: DROP
Enabled: false <<<<<<<<<<<<<<<<< If this is False, it will cause problems. Enable it using the below commands.
Loaded: true
[root@node1:~] esxcli network firewall set –e true
[root@node1:~] esxcli network firewall get
Default Action: DROP
Enabled: true
Loaded: true
Check 2:
[root@node1:~] esxcli network firewall ruleset list | grep -i scvm
ScvmClientConnectionRule false <<<<<<<<<<<<<<<<<<<<< If this is False, it will cause problems. Enable it using the below commands.
[root@node1:~] esxcli network firewall ruleset set –e true –r ScvmClientConnectionRule
[root@node1:~] esxcli network firewall ruleset list | grep -i scvm
ScvmClientConnectionRule true
Check iptable rules on the SCVM
Check and match the number of rules on all the SCVMs. If they do not match, open a TAC case to get it corrected.
root@SpringpathControllerI51U7U6QZX:~# iptables -L | wc -l
48