Host cannot communicate with all other nodes in vSAN enabled cluster
Affected Host: Host4 (= 10.X.X.21).
About the reported error message: “Host cannot communicate with all other nodes in vSAN enabled cluster,”
we did see this message in the Web Client: “Hosts with connectivity issues.”
From 03/15 4:48 AM local time (= at the time we saw it, it was already 05/15 7:14 PM local time).
The vSAN Healthcheck confirmed that there are still existing problems with communicating to this host
by showing errors/messages like these:
e.g. “Runtime fault” / “All hosts have a vSAN vmknic configured” / “Hosts with connectivity issues” / “All hosts have matching subnets”
This Issue happens on all servers at an vSAN cluster while one or more nodes are down for maintenance mode, and therefore they are not able to communicate with each other, or if there’s a valid issue with vSAN cluster communications. Under normal conditions, this particular message is eliminated after the hosts have reestablished the communicating. If this message appears from the Summary tab while the rest of the indicators report the vSAN system is “OK”, so this may be a cosmetic behavior.
Restart the vpxa service
When restarting the vCenter Management Agents on the ESXi server, you’ll find a brief disturbance to the server manageability. That is absolutely safe and will solve itself . In extreme cases, the host may momentarily enter a not responding state in vCenter Server but this is rare.
[root@vcsa]# /etc/init.d/vpxa restart
-→ While vpxa, hostd, vsanvpd, vsanmgmt, clomd were running on this host, After restarting all of these services, in order, resolved the problem, along with removing all Warnings related to that.
This is a typical phenomenon that can show up when Hosts have higher uptimes. They can develop issues with one or more services, not working correctly.
In this case:
Except for one host, all other Hosts have an uptime of 511/512 days.
Affected Host4: 514 days – Last Reboot: Fri May 16 13:52:04 UTC 2020.
Even though ESXi Hosts are not Windows Server, they also can become slow, unstable, or have a buildup of “Zombie Processes,” etc.
Which impacts overall Performance/Stability. As a result, we do recommend rebooting vSAN Hosts approx. Every 90 days.
The Reboot also does a complete refresh on the Disk Initialization which can resolve any minor existing communication issues between
Controller and Disks.
–> As a conclusion we do recommend that you get your Hosts rebooted by putting them into Maintenance Mode with “Ensure Accessibility” (= one at a time)
–> You might see vSAN Healthcheck and its remaining Warnings:
– Failed – vSAN Critical Alert – Patch available
When you are ready for an upgrade, please keep in mind to upgrade vCenter first, along with ensuring that Controller Driver & Firmware are appropriate for Target Host Build.
Ensure that your HW is certified for Target Host Build. If you are not sure, please contact your HW Vendor.
This also applies to any questions around Best Practices to update Controller Driver and Firmware.
– Warning – Disks usage on the storage controller
Provided KB: https://kb.vmware.com/s/article/2129050
It is not recommended having non-vSAN disks on vSAN HBA connected since this can cause issues.
– Warning – Controller utility is installed on host – perccli
Provided KB: https://kb.vmware.com/s/article/2148867
Recommended contacting HW Vendor if you are running into any problems with perccli
– Warning – Controller firmware is VMware certified – Current Firmware: N/A
Because perccli is not installed, this message shows up.
– Warning – vSAN Build Recommendation Engine Health
The system reported that Internet access for vCenter is unavailable.
– Warning – vSAN HCL DB up-to-date
Since the Internet access for vCenter is unavailable, vCenter cannot update the vSAN HCL DB on its own.
As the message details show: The local vSAN HCL DB is outdated.
Provided KB for manually updating it: https://kb.vmware.com/s/article/2145116
See also :=