OPERATION FAILED. REASON: LCM OPERATION KLCMUPDATEOPERATION FAILED ON PHOENIX,IP:

IP: [CC.CC.CC.128] DUE TO UPGRADE ENCOUNTERED AN ERROR: ERROR OCCURRED: FAILED TO START THE FOUNDATION SERVICE
Always make sure that your cluster can tolerate a node/host failure by having the data resiliency status as “OK” in Prism Elements dashboard.
Issue :
————————————–
LCM failed with error : kLcmUpdateOperation failed on phoenix, ip: [cc.cc.cc.128] due to Upgrade encountered an error: Error occurred: Failed to start the foundation service : on [u’10.x.x.x’],ret: False, err:Foundation service could not be started after 3 retries.. Logs have been collected and are available to download on 10.x.x.x at /home/nutanix/data/log_collector/lcm_logs__10.x.x.x__2020-05-16_13-10-.tar.gz
Current Status :
-Upgrade on the node x.x.x.128 is in progress
Findings / Summary :
————————————–
- checked via IPMI and confirmed the host is UP
- Confirmed host was online, but the CVM was in the maintenance mode.
- removed CVM from maintenance.
- All the nodes are running on the same foundation version and foundation service is stopped
- checked the logs and noticed the failure was due to Foundation service not starting, possible due to permission errors :
/home/nutanix/foundation/bin/../lib/py/nutanix_foundation.egg/foundation/monkey.py:160: UserWarning: Patching paramiko to use SHA256 for fingerprint
Traceback (most recent call last):
File "/home/nutanix/foundation/bin/foundation", line 368, in <module>
service(options, args)
File "/home/nutanix/foundation/bin/foundation", line 252, in service
main(options, args)
File "/home/nutanix/foundation/bin/foundation", line 171, in main
service_log = folder_central.get_service_log_path()
File "foundation/folder_central.py", line 309, in get_service_log_path
File "foundation/folder_central.py", line 190, in _get_ntnx_log_folder
File "foundation/folder_central.py", line 90, in _get_folder
File "/usr/lib64/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/home/nutanix/data/logs/foundation/.'
foundation.out (END)
Unable to open foundation directory for CVM 87 and 85
================== x.x.x.85 =================
ls: cannot open directory /home/nutanix/data/logs/foundation: Permission denied
================== x.x.x.86 =================
total of 160
drwxr-x---. 2 nutanix nutanix 4096 Jun 10 2017 archive
-rw-r-----. 1 nutanix nutanix 0 Jul 19 02:19 foundation_central.log
-rw-r-----. 1 nutanix nutanix 0 Jul 19 02:19 debug.log
-rw-r-----. 1 nutanix nutanix 0 Jul 19 02:19 api.log
-rw-r-----. 1 nutanix nutanix 0 Jul 19 02:19 http.error
-rw-r-----. 1 nutanix nutanix 0 Jul 19 02:19 http.access
-rw-r-----. 1 nutanix nutanix 0 Jul 19 02:19 component_manager.log
-rw-r-----. 1 nutanix nutanix 10735 Jul 19 11:54 phoenix.log
drwxr-x---. 3 nutanix nutanix 4096 Jul 19 11:54 .
drwxr-x---. 6 nutanix nutanix 139264 Jul 19 18:02 ..
================== 10.x.x.x =================
ls: cannot open directory /home/nutanix/data/logs/foundation: Permission denied
nutanix@NTNX-x.x.x-D-CVM:10.x.x.x:~/data/logs$
+ Found foundation directory owner and group was set to root on the node 87 and 85
nutanix@NTNX-x.x.x-D-CVM:10.x.x.x:~/data/logs$ allssh sudo ls -lad /home/nutanix/data/logs/foundation
================== x.x.x.126 =================
drwxr-x---. 4 nutanix nutanix 4096 Jul 19 06:54 /home/nutanix/data/logs/foundation
================== x.x.x.127 =================
drwxr-x---. 3 nutanix nutanix 4096 Jul 19 17:31 /home/nutanix/data/logs/foundation
================== x.x.x.128 =================
drwxr-x---. 4 nutanix nutanix 4096 Jul 19 12:48 /home/nutanix/data/logs/foundation
================== x.x.x.129 =================
drwxr-x---. 3 nutanix nutanix 4096 Jul 19 10:35 /home/nutanix/data/logs/foundation
================== x.x.x.84 =================
drwxr-x---. 4 nutanix nutanix 4096 Jul 19 12:47 /home/nutanix/data/logs/foundation
================== x.x.x.85 =================
drwxr-x---. 2 root root 4096 Jul 19 09:24 /home/nutanix/data/logs/foundation
================== x.x.x.86 =================
drwxr-x---. 3 nutanix nutanix 4096 Jul 19 11:54 /home/nutanix/data/logs/foundation
================== 10.x.x.x =================
drwxr-x---. 2 root root 4096 Jul 19 05:43 /home/nutanix/data/logs/foundation
nutanix@NTNX-x.x.x-D-CVM:10.x.x.x:~/data/logs$
Changed the directory owner and group to nutanix to resolve this issue
nutanix@X.X.x-CVM$ upgrade_status
2020-07-19 17:28:41 INFO zookeeper_session.py:131 upgrade_status is attempting to connect to Zookeeper
2020-07-19 17:28:41 INFO upgrade_status:38 Target release version: el7.3-release-euphrates-5.10.10-stable-125f671ba8982a0199e18b756e8ef33232
2020--07-19 17:28:41 INFO upgrade_status:43 Cluster upgrade method is set to: automatic rolling upgrade
2020-07-19 17:28:41 INFO upgrade_status:96 SVM x.x.x.x is up to date
2020-07-19 17:28:41 INFO upgrade_status:96 SVM x.x.x.x is up to date
2020-07-19 17:28:41 INFO upgrade_status:96 SVM x.x.x.x is up to date
2020-07-19 17:28:41 INFO upgrade_status:96 SVM x.x.x.x is up to date
Noticed that the pre-check/inventory was failing because node x.x.x.128 did not realize shutdown token
2020-07-19 15:25:16 INFO cluster_manager.py:4651 Not releasing token – HA status not UP for x.x.x.128
2020-07-19 15:26:03 INFO cluster_manager.py:4651 Not releasing token – HA status not UP for x.x.x.128
2020-07-19 15:26:48 INFO cluster_manager.py:4651 Not releasing token – HA status not UP for x.x.x.128
2020-07-19 15:33:08 INFO cluster_manager.py:4651 Not releasing token – HA status not UP for x.x.x.128
nutanix@X.X.x-CVM$ host_upgrade_status
2020-07-19 17:28:48 INFO zookeeper_session.py:131 host_upgrade_status is attempting to connect to Zookeeper
Automatic Hypervisor upgrade: Disabled
Target host version: el6.nutanix.20170830.402
2020-07-19: Completed hypervisor upgrade on this node
2020-07-19 Completed hypervisor upgrade on this node
2020-07-19 Completed hypervisor upgrade on this node
2020-07-19 Completed hypervisor upgrade on this node
Solution:-
Restarted genesis on the affected node to resolve this issue.
nutanix@X.X.x-CVM$ genesis restart
2020-07-19 18:02:03.491308: Stopping genesis (pids [5657, 7420, 7743, 7744, 9238, 9230])
2020-07-19 18:02:04.866536: Genesis started on pids [4537]
After successfully completion of LCM inventory started the firmware upgrades on the CVM and all hosts are upgraded.
LCM Issue :-
Note:- Always involve Nutanix Support for any activity.
==============================================================================
~/data/logs/foundation/last_session.log ------- Workflows Involving Phoenix
~/data/logs/lcm_wget.out --------- LCM Manifest Download from nutanix
~/data/logs/genesis.ou t---- Inventory & Upload Operations
~/data/logs/lcm_ops.out--- Inventory & Upload Operations
See also :-