RESYNC OF VSAN COMPONENTS ARE TAKING A LONG TIME TO COMPLETE
The rolling reboot of the VMware vSAN cluster has observed that the resync was increasing in size occasionally. After checking all the components, they were in a healthy state.
We checked for congestion, and there didn’t appear to be any. Checking the size of your vSAN datastore and disks shows around 86% usage. When a vSAN datastore goes above 80%, then a disk rebalance is required, This is what was causing the slow resync speed.
To prevent the disk rebalance being triggered, we have to temporarily increase the threshold to 95% from the default of 80%
esxcfg-advcfg -s 95 /VSAN/ClomRebalanceThreshold
on all of your hosts. After that, the rolling reboot completed, and all the resync’s have finished. Please run the following command to set it back to 80%
esxcfg-advcfg -s 80 /VSAN/ClomRebalanceThreshold.
After making these changes, we saw a positive impact on the resync, but we still were seeing jumps in the sync size. After reviewing the health status, we could see that there was an increase in the component count. This means because you had running VM’s and they were creating new components, so this is expected behavior.
There is nothing. Further, we can do at this point besides wait for the resync to finish and complete the rolling reboot. Once this has been completed, revert the threshold for disk balance back to 80%. I would advise moving forward to look into increasing the capacity of this cluster as this could cause a similar issue should a host go down again.
VMware recommends rebooting vSAN Hosts approx. Every 90 days.
The reboot also does a complete refresh on the Disk Initialization, which can resolve any minor existing communication issues between Controller and Disks.
Rebooting nodes at every 90 days also resolve performance-related issues.
Always keep up to date VMware infra with the latest version of all software and hardware firmware.
Hope this is helpful.
Also see :- VMware VCSA login issue