How to Verify Nutanix cluster health status

HOW TO VERIFY NUTANIX CLUSTER HEALTH STATUS

Issue:

When we are doing any upgradation, expansion, any maintenance activity, before making any such activity we should confirm that the cluster is working fine, all services are in running state.

Explanation:

In some cases, we need to check cluster status to make sure that it can tolerate single host or controller virtual machine failure.

Solution:

Sr. NoCheckDescription
1INFOThis state returned an expected value that cannot be evaluated as PASS or FAIL.
2FAILThis state aspect of the cluster is not healthy and must be addressed.
3PASSThe state aspect of the cluster is healthy and no further action is required.
4WARNThe state returned an unexpected value and must be investigated.
  1. Check services and nodes status with below command.
nutanix@NTNX-CVM:192.168.2.1:~$ cluster status or cs  

Run the below commands to check one by one all nodes.

nutanix@cvm1$ cluster status | grep -v UP
  1. If any Nodes or services that are unexpectedly in the down state need to be fixed before proceeding with the restart.
  2. verify cms ip are showing if any node is not showing that means it is removed from cassendra ring
nutanix@NTNX-CVM:192.168.2.1:~$ svmips
X.X.X.14,X.X.X.15,X.X.X.16,X.X.X.17
nutanix@NTNX-CVM:192.168.2.1:~$ nodetool -h 0 ring

Address         Status State      Load            Owns    Token

X.X.X.14     Up     Normal     27.54 GB        25.00% 
X.X.X.15     Up     Normal     87.11 GB        25.00%   
X.X.X.16     Up     Normal     63.17 GB        25.00%  
X.X.X.17     Up     Normal     98.34 GB        25.00%  

We can use to verify SVMIPS count and compare with below two commands.

nutanix@NTNX-CVM:192.168.2.1:~$ svmips | wc -w 
nutanix@NTNX-CVM:192.168.2.1:~$ nodetool -h 0 ring | grep Normal | grep -c Up

We can do health check from Prism

HOW TO VERIFY NUTANIX CLUSTER HEALTH STATUS
HOW TO VERIFY NUTANIX CLUSTER HEALTH STATUS :- Nutanix Health Check

Check Nutanix Cluster

nutanix@NTNX-CVM:192.168.2.1:~$ ncc health_checks run_all
NCC Health Checks Summary
#
SUMMARY
#
+-----------------------+
| State | Count |
+-----------------------+
| Pass | 221 |
| Info | 2 |
| Warning | 3 |
| Total Plugins | 226 |
+-----------------------+
#
DETAILED INFORMATION
#
Detailed information for recovery_point_availability_check:
Node 192.168.6.4:
WARN: Protection domain NTNX-PROD-DC does not have a local backup to recover from
Refer to KB 1879 (http://portal.nutanix.com/kb/1879) for details on recovery_point_availability_check
Detailed information for ngt_installer_version_check:
Node 192.168.6.4:
WARN: Following VMs do not have latest NGT version installed:
VM: VM1 NGT installed version:1.6 NGT latest version:1.7.3
VM: VM2 NGT installed version:1.6 NGT latest version:1.7.3
Refer to KB 5487 (http://portal.nutanix.com/kb/5487) for details on ngt_installer_version_check
Detailed information for backup_schedule_check:
Node 192.168.6.4:
WARN: Backup schedule does not exist for protection domain ProtectionDomain_05 protecting some entities. If this protection domain is created for use by backup software, this warning can be ignored.
WARN: Backup schedule does not exist for protection domain ProtectionDomain_06 protecting some entities. If this protection domain is created for use by backup software, this warning can be ignored.
WARN: Backup schedule does not exist for protection domain ProtectionDomain_07 protecting some entities. If this protection domain is created for use by backup software, this warning can be ignored.
WARN: Backup schedule does not exist for protection domain ProtectionDomain_03 protecting some entities. If this protection domain is created for use by backup software, this warning can be ignored.
WARN: Backup schedule does not exist for protection domain ProtectionDomain_01 protecting some entities. If this protection domain is created for use by backup software, this warning can be ignored.
WARN: Backup schedule does not exist for protection domain NTNX-PROD-DC protecting some entities. If this protection domain is created for use by backup software, this warning can be ignored.
WARN: Backup schedule does not exist for protection domain ProtectionDomain_04 protecting some entities. If this protection domain is created for use by backup software, this warning can be ignored.
WARN: Backup schedule does not exist for protection domain ProtectionDomain_02 protecting some entities. If this protection domain is created for use by backup software, this warning can be ignored.
Refer to KB 1910 (http://portal.nutanix.com/kb/1910) for details on backup_schedule_check
Detailed information for default_password_check:
Node 192.168.6.4:
INFO: One or more CVMs are using the default password
INFO: One or more hosts are using the default password
INFO: One or more IPMI devices are still using the default password
Refer to KB 6153 (http://portal.nutanix.com/kb/6153) for details on default_password_check
Detailed information for orphan_vm_snapshot_check:
Node 192.168.6.4:
INFO: Found 4 orphan VM snapshot(s): ['94b6d90b-05d5-486b-b31a-4f2a2cc04a09', 'e87572a1-a1a7-42a3-b0f8-e32f29f544b3', '648108f7-a9d8-4806-a1bd-62d8ff811113', 'e6a2b9e4-3a76-452f-a6ca-4ccb8147a39a'].
Refer to KB 3752 (http://portal.nutanix.com/kb/3752) for details on orphan_vm_snapshot_check
#
PLUGIN RESULTS
#
/health_checks/data_protection_checks/protection_domain_checks/backup_schedule_check [ WARN ]
/health_checks/data_protection_checks/protection_domain_checks/recovery_point_availability_check [ WARN ]
/health_checks/ngt_checks/ngt_installer_version_check [ WARN ]
/health_checks/hypervisor_checks/orphan_vm_snapshot_check [ INFO ]
/health_checks/system_checks/default_password_check [ INFO ]
/health_checks/cassandra_checks/cassandra_invalid_token_check [ PASS ]
/health_checks/cassandra_checks/cassandra_keyspace_cf_check [ PASS ]
/health_checks/cassandra_checks/cassandra_log_crash_check [ PASS ]
/health_checks/cassandra_checks/cassandra_log_memory_check [ PASS ]
/health_checks/cassandra_checks/cassandra_similar_token_check [ PASS ]
/health_checks/cassandra_checks/cassandra_sstable_health_warning_check [ PASS ]
/health_checks/cassandra_checks/cassandra_status_check [ PASS ]
/health_checks/cassandra_checks/nodetool_consistency_check [ PASS ]
/health_checks/cassandra_checks/ring_balance_check [ PASS ]
/health_checks/data_protection_checks/cloud_checks/aws_instance_check [ PASS ]
/health_checks/data_protection_checks/cloud_checks/aws_instance_type_check [ PASS ]
/health_checks/data_protection_checks/cloud_checks/check_cloud_gflags [ PASS ]
/health_checks/data_protection_checks/cloud_checks/cloud_remote_check [ PASS ]
/health_checks/data_protection_checks/cloud_checks/cloud_remote_version_check [ PASS ]
/health_checks/data_protection_checks/high_density_node_checks/dr_configuration_check [ PASS ]
/health_checks/data_protection_checks/high_density_node_checks/protection_domain_snapshots_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/aged_entity_centric_third_party_backup_snapshot_check[ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/aged_third_party_backup_snapshot_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/cross_hypervisor_ngt_installed_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/duplicate_vm_names_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/internal_consistency_group_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/invalid_vm_name_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/linked_clones_in_nearsync_pds_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/maximum_entities_in_consistency_group_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/metro_vstore_symlinks_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/pd_clones_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/pds_share_vms_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/protected_vg_whitelist_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/protected_vms_cbr_incapable_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/protection_domain_file_conflict_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/remote_stargate_version_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/snapshot_file_location_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/snapshot_missing_entities_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/ssd_snapshot_reserve_space_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/storage_container_mount_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/unsupported_vm_config_vstore_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/vstore_cg_file_count_check [ PASS ]
/health_checks/data_protection_checks/protection_domain_checks/vstore_pd_file_count_check [ PASS ]
/health_checks/data_protection_checks/remote_site_checks/dr_service_reachability_check [ PASS ]
/health_checks/data_protection_checks/remote_site_checks/remote_site_config_check [ PASS ]
/health_checks/data_protection_checks/remote_site_checks/remote_site_connectivity_check [ PASS ]
/health_checks/data_protection_checks/remote_site_checks/remote_site_has_correct_cluster_info_check [ PASS ]
/health_checks/data_protection_checks/remote_site_checks/remote_site_has_virtual_ip_check [ PASS ]
/health_checks/data_protection_checks/remote_site_checks/remote_site_mtu_check [ PASS ]
/health_checks/data_protection_checks/remote_site_checks/remote_site_time_sync_check [ PASS ]
/health_checks/draas_checks/vm_recovery_container_mount_check [ PASS ]
/health_checks/fileserver_checks/fileserver_cvm_checks/file_server_container_check [ PASS ]
/health_checks/fileserver_checks/fileserver_cvm_checks/file_server_licensing_check [ PASS ]
/health_checks/fileserver_checks/fileserver_cvm_checks/file_server_protect_status_check [ PASS ]
/health_checks/fileserver_checks/fileserver_cvm_checks/file_server_remote_site_status_check [ PASS ]
/health_checks/fileserver_checks/fileserver_cvm_checks/file_server_status_check [ PASS ]
/health_checks/fileserver_checks/fileserver_cvm_checks/file_server_upgrade_task_stuck_check [ PASS ]
/health_checks/fileserver_checks/fileserver_cvm_checks/file_server_version_check [ PASS ]
/health_checks/fileserver_checks/fileserver_cvm_checks/multiple_file_server_versions_check [ PASS ]
/health_checks/fileserver_checks/fileserver_cvm_checks/multiple_fsvm_on_single_node_check [ PASS ]
/health_checks/fileserver_checks/fileserver_cvm_checks/nvm_cpu_checks [ PASS ]
/health_checks/hardware_checks/dimm_checks/dimm_size_reduction_check [ PASS ]
/health_checks/hardware_checks/disk_checks/boot_raid_check [ PASS ]
/health_checks/hardware_checks/disk_checks/deleted_file_fd_check [ PASS ]
/health_checks/hardware_checks/disk_checks/disk_firmware_check [ PASS ]
/health_checks/hardware_checks/disk_checks/disk_id_duplicate_check [ PASS ]
/health_checks/hardware_checks/disk_checks/disk_online_check [ PASS ]
/health_checks/hardware_checks/disk_checks/disk_status_check [ PASS ]
/health_checks/hardware_checks/disk_checks/disk_storage_pool_check [ PASS ]
/health_checks/hardware_checks/disk_checks/disk_usage_check [ PASS ]
/health_checks/hardware_checks/disk_checks/host_disk_usage_check [ PASS ]
/health_checks/hardware_checks/disk_checks/hw_raid_check [ PASS ]
/health_checks/hardware_checks/disk_checks/incomplete_disk_removal_check [ PASS ]
/health_checks/hardware_checks/disk_checks/inode_usage_check [ PASS ]
/health_checks/hardware_checks/disk_checks/lsi_firmware_rev_check [ PASS ]
/health_checks/hardware_checks/disk_checks/m2_boot_disk_check [ PASS ]
/health_checks/hardware_checks/disk_checks/metadata_mounted_check [ PASS ]
/health_checks/hardware_checks/disk_checks/nvme_status_check [ PASS ]
/health_checks/hardware_checks/disk_checks/raid_health_check [ PASS ]
/health_checks/hardware_checks/disk_checks/sas_connectivity_status_check [ PASS ]
/health_checks/hardware_checks/disk_checks/sata_controller_check [ PASS ]
/health_checks/hardware_checks/disk_checks/sata_dom_status_check [ PASS ]
/health_checks/hardware_checks/disk_checks/sata_dom_wearout_check [ PASS ]
/health_checks/hardware_checks/disk_checks/satadom_free_block_check [ PASS ]
/health_checks/hardware_checks/disk_checks/ssd_configuration_check [ PASS ]
/health_checks/hardware_checks/disk_checks/storage_container_space_usage_check [ PASS ]
/health_checks/hardware_checks/disk_checks/storage_pool_space_usage_check [ PASS ]
/health_checks/hardware_checks/disk_checks/unreserved_available_space_check [ PASS ]
/health_checks/hardware_checks/disk_checks/vg_space_usage_check [ PASS ]
/health_checks/hardware_checks/ipmi_checks/dimm_hppr_check [ PASS ]
/health_checks/hardware_checks/ipmi_checks/ipmi_sel_cecc_check [ PASS ]
/health_checks/hardware_checks/ipmi_checks/ipmi_sel_check [ PASS ]
/health_checks/hardware_checks/ipmi_checks/ipmi_sel_uecc_check [ PASS ]
/health_checks/hardware_checks/ipmi_checks/ipmi_sensor_threshold_check [ PASS ]
/health_checks/hardware_checks/ipmi_checks/mixed_psu_check [ PASS ]
/health_checks/hardware_checks/ipmi_checks/power_supply_check [ PASS ]
/health_checks/hypervisor_checks/ahv_crash_file_check [ PASS ]
/health_checks/hypervisor_checks/ahv_cvm_startup_dependency_check [ PASS ]
/health_checks/hypervisor_checks/ahv_read_only_fs_check [ PASS ]
/health_checks/hypervisor_checks/ahv_time_zone_check [ PASS ]
/health_checks/hypervisor_checks/ahv_version_check [ PASS ]
/health_checks/hypervisor_checks/cvm_memory_reservation_check [ PASS ]
/health_checks/hypervisor_checks/cvm_virtual_hardware_version_check [ PASS ]
/health_checks/hypervisor_checks/gpu_driver_installed_check [ PASS ]
/health_checks/hypervisor_checks/host_cpu_contention [ PASS ]
/health_checks/hypervisor_checks/host_rx_packets_drop [ PASS ]
/health_checks/hypervisor_checks/hwclock_check [ PASS ]
/health_checks/hypervisor_checks/vm_checks [ PASS ]
/health_checks/hypervisor_checks/vm_swap_rate [ PASS ]
/health_checks/key_manager_checks/active_kmip_servers_check [ PASS ]
/health_checks/key_manager_checks/ca_certificate_expiry_check [ PASS ]
/health_checks/key_manager_checks/node_certificate_expiry_check [ PASS ]
/health_checks/key_manager_checks/sed_key_availability_check [ PASS ]
/health_checks/key_manager_checks/sw_encryption_key_availability_check [ PASS ]
/health_checks/metro_availability_checks/backup_snapshots_on_metro_secondary_check [ PASS ]
/health_checks/metro_availability_checks/data_locality_check [ PASS ]
/health_checks/metro_availability_checks/metro_aggressive_break_replication_timeout_check [ PASS ]
/health_checks/metro_availability_checks/metro_automatic_checkpoint_snapshot_check [ PASS ]
/health_checks/metro_availability_checks/metro_invalid_break_replication_timeout_check [ PASS ]
/health_checks/metro_availability_checks/secondary_metro_pd_in_sync_check [ PASS ]
/health_checks/metro_availability_checks/unsupported_vm_config_check [ PASS ]
/health_checks/network_checks/10gbe_check [ PASS ]
/health_checks/network_checks/check_network_segmentation_enabled [ PASS ]
/health_checks/network_checks/check_network_switch [ PASS ]
/health_checks/network_checks/check_ntp [ PASS ]
/health_checks/network_checks/check_unsupported_sfp [ PASS ]
/health_checks/network_checks/co_min_hc_bandwidth_check [ PASS ]
/health_checks/network_checks/conntrack_check [ PASS ]
/health_checks/network_checks/conntrack_mode_check [ PASS ]
/health_checks/network_checks/cvm_dvportgroup_binding_check [ PASS ]
/health_checks/network_checks/cvm_mtu_check [ PASS ]
/health_checks/network_checks/cvm_mtu_uniformity_check [ PASS ]
/health_checks/network_checks/cvm_time_drift_check [ PASS ]
/health_checks/network_checks/duplicate_cvm_ip_check [ PASS ]
/health_checks/network_checks/duplicate_hypervisor_ip_check [ PASS ]
/health_checks/network_checks/host_cvm_subnets_check [ PASS ]
/health_checks/network_checks/host_nic_error_check [ PASS ]
/health_checks/network_checks/host_pingable_check [ PASS ]
/health_checks/network_checks/inter_cvm_connections_check [ PASS ]
/health_checks/network_checks/inter_cvm_ping_latency_check [ PASS ]
/health_checks/network_checks/mellanox_nic_driver_version_check [ PASS ]
/health_checks/network_checks/mellanox_nic_mixed_family_check [ PASS ]
/health_checks/network_checks/mellanox_nic_status_check [ PASS ]
/health_checks/network_checks/ndp_check [ PASS ]
/health_checks/network_checks/nic_link_down_check [ PASS ]
/health_checks/network_checks/ns_config_consistency_check [ PASS ]
/health_checks/network_checks/ofpfmfc_table_full_check [ PASS ]
/health_checks/network_checks/zeus_config_ip_address_check [ PASS ]
/health_checks/pulse_checks/rest_connection_checks [ PASS ]
/health_checks/sar_checks/hdd_latency_threshold_check [ PASS ]
/health_checks/sar_checks/sar_stats_threshold_check [ PASS ]
/health_checks/sar_checks/ssd_latency_threshold_check [ PASS ]
/health_checks/stargate_checks/compression_disabled_check [ PASS ]
/health_checks/stargate_checks/dedup_auto_disabled_check [ PASS ]
/health_checks/stargate_checks/garbage_egroups_size_check [ PASS ]
/health_checks/stargate_checks/ondisk_dedup_enabled_check [ PASS ]
/health_checks/stargate_checks/oplog_episode_count_check [ PASS ]
/health_checks/stargate_checks/unresponsive_stargate_check [ PASS ]
/health_checks/system_checks/alert_manager_service_check [ PASS ]
/health_checks/system_checks/all_flash_nodes_intermixed_check [ PASS ]
/health_checks/system_checks/auto_support_check [ PASS ]
/health_checks/system_checks/bmc_bios_version_check [ PASS ]
/health_checks/system_checks/chassis_cpus_type_check [ PASS ]
/health_checks/system_checks/check_erasure_code_config [ PASS ]
/health_checks/system_checks/check_network_configuration_files [ PASS ]
/health_checks/system_checks/check_virtual_ip_is_in_cluster_external_subnet [ PASS ]
/health_checks/system_checks/check_vm_pinning_config [ PASS ]
/health_checks/system_checks/cluster_active_upgrade_check [ PASS ]
/health_checks/system_checks/cluster_disabled_upgrade_check [ PASS ]
/health_checks/system_checks/cluster_services_down_check [ PASS ]
/health_checks/system_checks/cluster_services_status [ PASS ]
/health_checks/system_checks/cluster_version_check [ PASS ]
/health_checks/system_checks/co_sizing_cvm_size_check [ PASS ]
/health_checks/system_checks/co_sizing_hc_co_ratio_check [ PASS ]
/health_checks/system_checks/co_sizing_hc_count_check [ PASS ]
/health_checks/system_checks/content_cache_dedup_ref_check [ PASS ]
/health_checks/system_checks/coreoff_check [ PASS ]
/health_checks/system_checks/cpu_avx_check [ PASS ]
/health_checks/system_checks/cpu_unblock_check [ PASS ]
/health_checks/system_checks/cvm_connectivity [ PASS ]
/health_checks/system_checks/cvm_memory_check [ PASS ]
/health_checks/system_checks/cvm_memory_usage_check [ PASS ]
/health_checks/system_checks/cvm_name_check [ PASS ]
/health_checks/system_checks/cvm_reboot_check [ PASS ]
/health_checks/system_checks/cvm_same_ncc_version_check [ PASS ]
/health_checks/system_checks/cvm_services_status [ PASS ]
/health_checks/system_checks/degraded_node_check [ PASS ]
/health_checks/system_checks/dense_node_configuration_checks [ PASS ]
/health_checks/system_checks/dimm_interop_check [ PASS ]
/health_checks/system_checks/dns_server_check [ PASS ]
/health_checks/system_checks/email_alerts_check [ PASS ]
/health_checks/system_checks/factory_config_validation_check [ PASS ]
/health_checks/system_checks/field_advisory_61_check [ PASS ]
/health_checks/system_checks/file_permission_check [ PASS ]
/health_checks/system_checks/fru_fields_correctness_check [ PASS ]
/health_checks/system_checks/fs_inconsistency_check [ PASS ]
/health_checks/system_checks/gflags_diff_check [ PASS ]
/health_checks/system_checks/gpu_mixed_check [ PASS ]
/health_checks/system_checks/high_frequency_snapshotting_ssd_config_compatibility_check [ PASS ]
/health_checks/system_checks/host_connectivity [ PASS ]
/health_checks/system_checks/host_cpu_frequency_check [ PASS ]
/health_checks/system_checks/hostname_resolution_check [ PASS ]
/health_checks/system_checks/http_proxy_check [ PASS ]
/health_checks/system_checks/idf_db_to_db_sync_heartbeat_status_check [ PASS ]
/health_checks/system_checks/kernel_memory_usage_check [ PASS ]
/health_checks/system_checks/ldap_config_check [ PASS ]
/health_checks/system_checks/m10_gpu_check [ PASS ]
/health_checks/system_checks/m60_gpu_check [ PASS ]
/health_checks/system_checks/ngt_ca_setup_check [ PASS ]
/health_checks/system_checks/notifications_dropped_check [ PASS ]
/health_checks/system_checks/p40_gpu_check [ PASS ]
/health_checks/system_checks/pc_default_password_check [ PASS ]
/health_checks/system_checks/remote_support_status_check [ PASS ]
/health_checks/system_checks/rsyslog_connectivity_check [ PASS ]
/health_checks/system_checks/same_hypervisor_version_check [ PASS ]
/health_checks/system_checks/same_timezone_check [ PASS ]
/health_checks/system_checks/snapshot_chain_height_check [ PASS ]
/health_checks/system_checks/snapshot_space_check [ PASS ]
/health_checks/system_checks/sp_usage_check [ PASS ]
/health_checks/system_checks/storage_container_replication_factor_check [ PASS ]
/health_checks/system_checks/sufficient_disk_space_check [ PASS ]
/health_checks/system_checks/v100_gpu_check [ PASS ]
/health_checks/system_checks/vdisk_count_check [ PASS ]
/health_checks/system_checks/virtual_ip_check [ PASS ]
/health_checks/system_checks/zkalias_check_plugin [ PASS ]
/health_checks/system_checks/zkinfo_check_plugin [ PASS ]
#
CLUSTER DETAILS
#
NCC Version :3.10.0.1-f46e3c78
Cluster Id :119ew9939r343
Cluster Name :NTNX-PROD-DC_VDI
Cluster Ips :['192.168.6.1', '192.168.6.2', '192.168.6.3', '192.168.6.4', '192.168.6.23', '192.168.6.24']
Timestamp :Sun Aug 2 17:30:35 2020
node with service vm id 7
service vm external ip: 192.168.6.1
hypervisor address list: [u'xxxxx']
hypervisor version: Nutanix 20170830.412
ipmi address list: [x.x.x.x']
software version: euphrates-5.15.1.1-stable
software changeset ID: 9714c2558ddas49dbcbf5e70fd8ds
node serial:
rackable unit: NX-8035-G6
node position: A
block S/N:
node with service vm id 8
service vm external ip: 192.168.6.2
hypervisor address list: x.x.x.x
hypervisor version: Nutanix 20170830.412
ipmi address list: x.x.x.x
software version: euphrates-5.15.1.1-stable
software changeset ID: 9714c2558ddas49dbcbf5e70fd8ds
node serial:
rackable unit: NX-8035-G6
node position: B
block S/N:
node with service vm id 9
service vm external ip: 192.168.6.3
hypervisor address list: x.x.x.x
hypervisor version: Nutanix 20170830.412
ipmi address list: x.x.x.x
software version: euphrates-5.15.1.1-stable
software changeset ID: 9714c2558ddas49dbcbf5e70fd8ds
node serial:
rackable unit: NX-8035-G6
node position: A
block S/N:
node with service vm id 10
service vm external ip: 192.168.6.4
hypervisor address list: x.x.x.x
hypervisor version: Nutanix 20170830.412
ipmi address list: x.x.x.x
software version: euphrates-5.15.1.1-stable
software changeset ID: 9714c2558ddas49dbcbf5e70fd8ds
node serial:
rackable unit: NX-8035-G6
node position: B
block S/N: 166
node with service vm id 622757395
service vm external ip: 192.168.6.23
hypervisor address list: [u'192.168.6.25']
hypervisor version: Nutanix 20170830.412
ipmi address list: [u'192.168.6.21']
software version: euphrates-5.15.1.1-stable
software changeset ID: 9714c2558ddas49dbcbf5e70fd8ds
node serial:
rackable unit: NX-8035-G7
node position: A
block S/N:

All nodes in the Cassandra ring must be in the up state.

3. Verify if there are any recent FATALfiles in the this directory ~nutanix/data/logs 

nutanix@NTNX-CVM:192.168.2.1:~$ ls -ltrh ~/data/logs/*FATAL*

4. Check if any Stargate node is down or if ha.pyis enabled.

nutanix@NTNX-CVM:192.168.2.1:~$ ncc health_checks network_checks ha_py_rerouting_check

If NCC is showing any issue resolve those critical issues contact nutanix support engineer
Another way is to check HA depending on the hypervisor.

  • ESXi
nutanix@NTNX-CVM:192.168.2.1:~$ allssh 'ssh root@192.168.5.1 esxcfg-route -l' | grep --color 192.168.5.2
  • AHV
nutanix@NTNX-CVM:192.168.2.1:~$ allssh 'ssh root@192.168.5.1 netstat -nr' | grep --color 192.168.5.2
  • Hyper-V
nutanix@NTNX-CVM:192.168.2.1:~$ allssh 'winsh netstat -nr' | grep -w --color 192.168.5.2

NCC checks or plugins that report a FAIL status can be re-run.​

nutanix@NTNX-CVM:192.168.2.1:~$ ncc --rerun_failing_plugins=True There are no failed plugins in the last NCC run

5. Verify if the cluster can tolerate a single node failure.

nutanix@NTNX-CVM:192.168.2.1:~$ ncli cluster get-domain-fault-tolerance-status type=node
 
    Domain Type               : NODE
    Component Type            : STATIC_CONFIGURATION
    Current Fault Tolerance   : 1
    Fault Tolerance Details   : 
    Last Update Time          : Wed Nov 18 14:22:09 GMT+05:00 2019
 
    Domain Type               : NODE
    Component Type            : ERASURE_CODE_STRIP_SIZE
    Current Fault Tolerance   : 1
    Fault Tolerance Details   : 
    Last Update Time          : Wed Nov 18 13:19:58 GMT+05:00 2019
 
    Domain Type               : NODE
    Component Type            : METADATA
    Current Fault Tolerance   : 1
    Fault Tolerance Details   : 
    Last Update Time          : Mon Sep 28 14:35:25 GMT+05:00 2019
 
    Domain Type               : NODE
    Component Type            : ZOOKEEPER
    Current Fault Tolerance   : 1
    Fault Tolerance Details   : 
    Last Update Time          : Thu Sep 17 11:09:39 GMT+05:00 2019
 
    Domain Type               : NODE
    Component Type            : EXTENT_GROUPS
    Current Fault Tolerance   : 1
    Fault Tolerance Details   : 
    Last Update Time          : Wed Nov 18 13:19:58 GMT+05:00 2019
 
    Domain Type               : NODE
    Component Type            : OPLOG
    Current Fault Tolerance   : 1
    Fault Tolerance Details   : 
    Last Update Time          : Wed Nov 18 13:19:58 GMT+05:00 2019
 
    Domain Type               : NODE
    Component Type            : FREE_SPACE
    Current Fault Tolerance   : 1
    Fault Tolerance Details   : 
    Last Update Time          : Wed Nov 18 14:20:57 GMT+05:00 2019
 

6 Review any unacknowledged alerts and their create time which is resolved.:

nutanix@NTNX-CVM:192.168.2.1:~$ ncli alert ls | grep -E 'Mes|Cre' ; date
Acknowledge alert
Acknowledge alert

See also :- Nutanix HA

Reference

Nutanix KB

3 comments

Leave a Reply