NCC Health Check: disk_usage_check(alert)

Issue:

The NCC health checks disk_usage_check verifies if any individual disk or CVM / system partition usage is showing above 80% usage because of Journal.

Description:

This is because journalctl was old archived files and it could write and preserve logs up-to 1GB.

Run below comand to allow journalctl to use a maxsystem storage of 512MB for logging.

nutanix@cvm1:~$ allssh ‘sudo journalctl –vacuum-size=512M’

  • If CVM System Root Partition Usage is more that 80%, a WARNING is returned.
  • If CVM System Root Partition Usage is more that 90%,a FAIL is returned.

Re-run this NCC health check per below steps to confirm if investigation is still required.

Solution:

1. Triggered by a disk (e.g. /dev/sdX):

In this case, data cleanup within the container might be required.

  • Deleting unused data or stale VMs and/or data also helps reduce space usage.
  • Check if there is any unnecessary reserved space at the container level.
  • Check if unused Snapshots are present on cluster
  • Adding nodes to the cluster also increases available space to provision for VMs.
nutanix@cvm1$ ncli health-check ls id=1003

2. Triggered by a CVM non-system partition (e.g. /home):

Look at the NCC output to determine which CVMs and which system partitions have high usage, then do the following:

  1. Log in to the any CVM with putty.
  2. Use the cd command and go to /home directory
  3. List the contents of the directory by size using the command below:
  4. nutanix@cvm1$ ls -al | sort -k5,5nr
  5. Delete the logs or ISO files which are not required.
  6. Run the du command below to list the usage of each file and subdirectory.
  7. nutanix@cvm1$ sudo du -skx * | sort -rn
  8. Below are some common subdirectories under /home where large unused files may exist, which can be deleted:
    • /home/nutanix/software_downloads/
    • /home/nutanix/software_uncompressed/
    • /home/nutanix/data/cores 
    • /home/nutanix/data/log_collector/ delete old logs
    • /home/nutanix/foundation/isos/ – delete Old ISOs of Hypervisor or Phoenix.
    • /home/nutanix/foundation/tmp/ – Temporary files that can be deleted.
    • /home/nutanix/data/logbay/bundles/ – Delete log bundles no longer needed.

3. Triggered by a CVM root system partition (i.e. /):If the root partition usage is above 80% and AOS is above 5.11. contact Nutanix Support team.
If you are getting usage alerts on the / partition  and the usage is less than 80% based on the output of:

nutanix@cvm1$ df -h /

Gather the following command outputs and attach it to the support case and upload on case.:

nutanix@cvm1$ ncc health_checks run_all

if the issue is not resolved Support team will run script for resolving this issue.

Leave a Reply