1 DIMM RAS event found for P1-DIMMD1 on host 10.X.X.X in last 24 hours. Threshold : 1. Installed BIOS version is PU42.300{Error}

Issue :-

Memory is failing alert will be in prism Element

Solution :-

if this is latest version of BMC and BIOS(PU42.300) the steps to resolve this issue are as follows.

1)Enter the affected host in maintenance mode.
2)Shutdown CVM and Reboot the host to allow PPR (Post Package Repair) to be automatically performed. We can do this without downtime on cluster. 
3)Confirm PPR was successful in the IPMI SEL log. Download and share the same. If PPR was not successful we will replace the DIMM indicated in the alert
In a 3 month timeframe, if RAS enables on the same DIMM that has already had 1 PPR cycle performed against it, we will replace the DIMM.

follow the KB-7503 and KB-9137 for more information.
  1. If BIOS version is P[X]42.300 or newer, reboot the node(host) to automatically trigger DIMM repair.

2. If BIOS version is P[X]42.300 or newer & post CECC are detected – Contact Nutanix Support. If BIOS version is P[X]41.002 & P[X]42.002.

Once the the PPR(Post Package Repair) is completed you can see in IPMI log

Leave a Reply