Operation failed. Reason: LCM operation kLcmUpdateOperation failed on phoenix

OPERATION FAILED. REASON: LCM OPERATION KLCMUPDATEOPERATION FAILED ON PHOENIX,IP:

IP: [CC.CC.CC.128] DUE TO UPGRADE ENCOUNTERED AN ERROR: ERROR OCCURRED: FAILED TO START THE FOUNDATION SERVICE

Always make sure that your cluster can tolerate a node/host failure by having the data resiliency status as “OK” in Prism Elements dashboard.

Issue :

————————————–
LCM failed with error : kLcmUpdateOperation failed on phoenix, ip: [cc.cc.cc.128] due to Upgrade encountered an error: Error occurred: Failed to start the foundation service : on [u’10.x.x.x’],ret: False, err:Foundation service could not be started after 3 retries.. Logs have been collected and are available to download on 10.x.x.x at /home/nutanix/data/log_collector/lcm_logs__10.x.x.x__2020-05-16_13-10-.tar.gz

Current Status :

-Upgrade on the node x.x.x.128 is in progress

Findings / Summary :
————————————–

checked via IPMI and confirmed the host is UP
Confirmed host was online, but the CVM was in the maintenance mode.
removed CVM from maintenance.
All the nodes are running on the same foundation version and foundation service is stopped
checked the logs and noticed the failure was due to Foundation service not starting, possible due to permission errors :

/home/nutanix/foundation/bin/../lib/py/nutanix_foundation.egg/foundation/monkey.py:160: UserWarning: Patching paramiko to use SHA256 for fingerprint
Traceback (most recent call last):
  File "/home/nutanix/foundation/bin/foundation", line 368, in <module>
    service(options, args)
  File "/home/nutanix/foundation/bin/foundation", line 252, in service
    main(options, args)
  File "/home/nutanix/foundation/bin/foundation", line 171, in main
    service_log = folder_central.get_service_log_path()
  File "foundation/folder_central.py", line 309, in get_service_log_path
  File "foundation/folder_central.py", line 190, in _get_ntnx_log_folder
  File "foundation/folder_central.py", line 90, in _get_folder
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/home/nutanix/data/logs/foundation/.'
foundation.out (END)

Unable to open foundation directory for CVM 87 and 85

================== x.x.x.85 =================
ls: cannot open directory /home/nutanix/data/logs/foundation: Permission denied
================== x.x.x.86 =================
total of 160
drwxr-x---. 2 nutanix nutanix   4096 Jun 10  2017 archive
-rw-r-----. 1 nutanix nutanix      0 Jul 19 02:19 foundation_central.log
-rw-r-----. 1 nutanix nutanix      0 Jul 19 02:19 debug.log
-rw-r-----. 1 nutanix nutanix      0 Jul 19 02:19 api.log
-rw-r-----. 1 nutanix nutanix      0 Jul 19 02:19 http.error
-rw-r-----. 1 nutanix nutanix      0 Jul 19 02:19 http.access
-rw-r-----. 1 nutanix nutanix      0 Jul 19 02:19 component_manager.log
-rw-r-----. 1 nutanix nutanix  10735 Jul 19 11:54 phoenix.log
drwxr-x---. 3 nutanix nutanix   4096 Jul 19 11:54 .
drwxr-x---. 6 nutanix nutanix 139264 Jul 19 18:02 ..
================== 10.x.x.x =================
ls: cannot open directory /home/nutanix/data/logs/foundation: Permission denied
nutanix@NTNX-x.x.x-D-CVM:10.x.x.x:~/data/logs$

+ Found foundation directory owner and group was set to root on the node 87 and 85

nutanix@NTNX-x.x.x-D-CVM:10.x.x.x:~/data/logs$ allssh sudo  ls -lad /home/nutanix/data/logs/foundation
================== x.x.x.126 =================
drwxr-x---. 4 nutanix nutanix 4096 Jul 19 06:54 /home/nutanix/data/logs/foundation
================== x.x.x.127 =================
drwxr-x---. 3 nutanix nutanix 4096 Jul 19 17:31 /home/nutanix/data/logs/foundation
================== x.x.x.128 =================
drwxr-x---. 4 nutanix nutanix 4096 Jul 19 12:48 /home/nutanix/data/logs/foundation
================== x.x.x.129 =================
drwxr-x---. 3 nutanix nutanix 4096 Jul 19 10:35 /home/nutanix/data/logs/foundation
================== x.x.x.84 =================
drwxr-x---. 4 nutanix nutanix 4096 Jul 19 12:47 /home/nutanix/data/logs/foundation
================== x.x.x.85 =================
drwxr-x---. 2 root root 4096 Jul 19 09:24 /home/nutanix/data/logs/foundation
================== x.x.x.86 =================
drwxr-x---. 3 nutanix nutanix 4096 Jul 19 11:54 /home/nutanix/data/logs/foundation
================== 10.x.x.x =================
drwxr-x---. 2 root root 4096 Jul 19 05:43 /home/nutanix/data/logs/foundation
nutanix@NTNX-x.x.x-D-CVM:10.x.x.x:~/data/logs$

Changed the directory owner and group to nutanix to resolve this issue

nutanix@X.X.x-CVM$ upgrade_status
2020-07-19 17:28:41 INFO zookeeper_session.py:131 upgrade_status is attempting to connect to Zookeeper
2020-07-19 17:28:41 INFO upgrade_status:38 Target release version: el7.3-release-euphrates-5.10.10-stable-125f671ba8982a0199e18b756e8ef33232
2020--07-19 17:28:41 INFO upgrade_status:43 Cluster upgrade method is set to: automatic rolling upgrade
2020-07-19 17:28:41 INFO upgrade_status:96 SVM x.x.x.x is up to date
2020-07-19 17:28:41 INFO upgrade_status:96 SVM x.x.x.x is up to date
2020-07-19 17:28:41 INFO upgrade_status:96 SVM x.x.x.x is up to date
2020-07-19 17:28:41 INFO upgrade_status:96 SVM x.x.x.x is up to date

Noticed that the pre-check/inventory was failing because node x.x.x.128 did not realize shutdown token

2020-07-19 15:25:16 INFO cluster_manager.py:4651 Not releasing token – HA status not UP for x.x.x.128
2020-07-19 15:26:03 INFO cluster_manager.py:4651 Not releasing token – HA status not UP for x.x.x.128
2020-07-19 15:26:48 INFO cluster_manager.py:4651 Not releasing token – HA status not UP for x.x.x.128
2020-07-19 15:33:08 INFO cluster_manager.py:4651 Not releasing token – HA status not UP for x.x.x.128

nutanix@X.X.x-CVM$ host_upgrade_status
2020-07-19 17:28:48 INFO zookeeper_session.py:131 host_upgrade_status is attempting to connect to Zookeeper
Automatic Hypervisor upgrade: Disabled
Target host version: el6.nutanix.20170830.402
2020-07-19: Completed hypervisor upgrade on this node
2020-07-19 Completed hypervisor upgrade on this node
2020-07-19 Completed hypervisor upgrade on this node
2020-07-19 Completed hypervisor upgrade on this node

Solution:-

Restarted genesis on the affected node to resolve this issue.

nutanix@X.X.x-CVM$ genesis restart
2020-07-19 18:02:03.491308: Stopping genesis (pids [5657, 7420, 7743, 7744, 9238, 9230])
2020-07-19 18:02:04.866536: Genesis started on pids [4537]

After successfully completion of LCM inventory started the firmware upgrades on the CVM and all hosts are upgraded.

LCM Issue :-

Note:- Always involve Nutanix Support for any activity.

==============================================================================

~/data/logs/foundation/last_session.log ------- Workflows Involving Phoenix

~/data/logs/lcm_wget.out --------- LCM Manifest Download from nutanix

~/data/logs/genesis.ou t----  Inventory & Upload Operations

~/data/logs/lcm_ops.out--- Inventory & Upload Operations

Operation failed. Reason: LCM operation kLcmUpdateOperation failed on phoenix

Issue :

Noticed that the pre-check/inventory was failing because node x.x.x.128 did not realize shutdown token

Solution:-

Like this:

Related

Leave a ReplyCancel reply

Issue :

Noticed that the pre-check/inventory was failing because node x.x.x.128 did not realize shutdown token

Solution:-

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Mastering Nutanix